Implementation:Evidentlyai Evidently Text Match Descriptor

Knowledge Sources	Evidentlyai_Evidently
Domains	NLP, Text Analysis, Data Quality
Last Updated	2026-02-14 12:00 GMT

Overview

Implements the unified TextMatch descriptor and convenience functions for all text/word matching scenarios, replacing multiple legacy text matching features with a single, configurable API.

Description

The text_match module provides a modern, unified approach to text matching in Evidently. The central class TextMatch is a Descriptor subclass that consolidates various text matching strategies (contains, not_contains, exact, regex) into one configurable component.

Core Classes:

TextMatchOptions -- Configuration container for text processing behavior:
- case_sensitive: Whether matching respects case (default: True).
- lemmatize: Whether to apply lemmatization before matching (default: False). Requires NLTK.
- word_boundaries: Whether to extract words using regex word boundaries before matching (default: False).

TextMatchProcessor -- Centralized processing engine that provides:
- process_text(text): Unified text preprocessing pipeline (word boundary extraction, lemmatization).
- all_match(text, items): Check if text contains ALL specified items.
- any_match(text, items): Check if text contains ANY specified items.
- regex_match(text, pattern): Check if text matches a regex pattern.
- exact_match(text, items): Check if text exactly matches any item.
- Lazy-loads NLTK WordNetLemmatizer only when lemmatization is requested.

TextMatch -- The main descriptor class with the following fields:
- column_name: Text column to match against.
- match_items: Either a list of strings to match, or a column name for column-to-column matching.
- match_type: One of "contains", "not_contains", "exact", "regex".
- match_mode: One of "any" (default) or "all".
- case_sensitive, lemmatize, word_boundaries: Processing options.
- Smart defaults: when lemmatize=True, word_boundaries is automatically enabled.
- The generate_data() method produces a DatasetColumn of type ColumnType.Categorical containing boolean match results.

Convenience Functions:

These functions provide backward-compatible APIs matching legacy feature names, all returning TextMatch instances:

Function	match_type	match_mode	Special Behavior
Contains()	contains	any/all	Basic substring matching
DoesNotContain()	not_contains	inverted mode	Legacy inverted mode logic
ItemMatch()	contains	any/all	Column-to-column matching (requires 2 columns)
ItemNoMatch()	not_contains	inverted mode	Column-to-column not-contains
WordsPresence()	contains/not_contains	varies by mode arg	case_sensitive=False, word_boundaries=True
IncludesWords()	contains	any/all	case_sensitive=False, word_boundaries=True, lemmatize=True
ExcludesWords()	not_contains	any/all	case_sensitive=False, word_boundaries=True, lemmatize=True
WordMatch()	contains	any/all	Column-to-column, word_boundaries=True, lemmatize=True
WordNoMatch()	not_contains	any/all	Column-to-column, word_boundaries=True, lemmatize=True
TriggerWordsPresent()	contains	any	case_sensitive=False, word_boundaries=True, lemmatize=True
RegExp()	regex	N/A	Single regex pattern matching

Usage

Use this module when:

Checking if text columns contain or exclude specific words or phrases.
Performing column-to-column text matching (e.g., comparing response text against keyword columns).
Running regex pattern matching on text data.
Needing lemmatization-aware word matching.
Migrating from legacy text matching features to the V2 API.

Code Reference

Source Location

Repository: Evidentlyai_Evidently
File: src/evidently/descriptors/text_match.py

Signature

class TextMatchOptions:
    def __init__(self, case_sensitive: bool = True, lemmatize: bool = False, word_boundaries: bool = False)

class TextMatchProcessor:
    def __init__(self, options: TextMatchOptions)
    def process_text(self, text: str) -> str
    def all_match(self, text: str, items: List[str]) -> bool
    def any_match(self, text: str, items: List[str]) -> bool
    def regex_match(self, text: str, pattern: str) -> bool
    def exact_match(self, text: str, items: List[str]) -> bool

class TextMatch(Descriptor):
    column_name: str
    match_items: Union[str, List[str]]
    match_type: Literal["contains", "not_contains", "exact", "regex"] = "contains"
    match_mode: Literal["any", "all"] = "any"
    case_sensitive: bool = True
    lemmatize: bool = False
    word_boundaries: bool = False
    def generate_data(self, dataset: Dataset, options: Options) -> DatasetColumn
    def list_input_columns(self) -> List[str]

# Convenience functions
def Contains(column_name, items, case_sensitive=True, mode="any", alias=None, tests=None) -> TextMatch
def DoesNotContain(column_name, items, ...) -> TextMatch
def ItemMatch(columns, ...) -> TextMatch
def ItemNoMatch(columns, ...) -> TextMatch
def WordsPresence(column_name, words_list, mode="includes_any", lemmatize=True, ...) -> TextMatch
def IncludesWords(column_name, words_list, mode="any", lemmatize=True, ...) -> TextMatch
def ExcludesWords(column_name, words_list, mode="any", lemmatize=True, ...) -> TextMatch
def WordMatch(columns, mode="any", lemmatize=True, ...) -> TextMatch
def WordNoMatch(columns, mode="any", lemmatize=True, ...) -> TextMatch
def TriggerWordsPresent(column_name, words_list, lemmatize=True, ...) -> TextMatch
def RegExp(column_name, reg_exp, ...) -> TextMatch

Import

from evidently.descriptors.text_match import (
    TextMatch,
    TextMatchOptions,
    TextMatchProcessor,
    Contains,
    DoesNotContain,
    ItemMatch,
    ItemNoMatch,
    WordsPresence,
    IncludesWords,
    ExcludesWords,
    WordMatch,
    WordNoMatch,
    TriggerWordsPresent,
    RegExp,
)

I/O Contract

Inputs

Name	Type	Required	Description
column_name	str	Yes	Name of the text column to match against
match_items	Union[str, List[str]]	Yes	List of strings to match, or a column name for column-to-column matching
match_type	Literal["contains", "not_contains", "exact", "regex"]	No	Type of matching to perform (default: "contains")
match_mode	Literal["any", "all"]	No	Whether to require any or all items to match (default: "any")
case_sensitive	bool	No	Whether matching is case-sensitive (default: True)
lemmatize	bool	No	Whether to lemmatize words before matching (default: False)
word_boundaries	bool	No	Whether to extract words using word boundaries (default: False)
alias	Optional[str]	No	Custom display name for the descriptor
tests	Optional[List[AnyDescriptorTest]]	No	Tests to apply to descriptor output

Outputs

Name	Type	Description
return	DatasetColumn	Column of type ColumnType.Categorical containing boolean match results (True/False) for each row

Usage Examples

Basic Contains Check

from evidently.descriptors.text_match import TextMatch

descriptor = TextMatch(
    column_name="response",
    match_items=["urgent", "important", "critical"],
    match_type="contains",
    match_mode="any",
    case_sensitive=False,
)

Column-to-Column Matching

from evidently.descriptors.text_match import TextMatch

descriptor = TextMatch(
    column_name="response",
    match_items="expected_keywords",  # column name
    match_type="contains",
    match_mode="all",
)

Exclusion with Lemmatization

from evidently.descriptors.text_match import TextMatch

descriptor = TextMatch(
    column_name="description",
    match_items=["spam", "test", "ignore"],
    match_type="not_contains",
    lemmatize=True,
)

Regex Matching

from evidently.descriptors.text_match import RegExp

descriptor = RegExp(
    column_name="phone_field",
    reg_exp=r"\b\d{3}-\d{3}-\d{4}\b",
    alias="Phone Number Format",
)

Using Convenience Functions

from evidently.descriptors.text_match import IncludesWords, ExcludesWords

includes = IncludesWords(
    column_name="answer",
    words_list=["python", "java", "rust"],
    mode="any",
    lemmatize=True,
)

excludes = ExcludesWords(
    column_name="answer",
    words_list=["error", "fail", "crash"],
    mode="all",
    lemmatize=True,
)

Related Pages

Environment:Evidentlyai_Evidently_Python_Core_Environment
Implementation:Evidentlyai_Evidently_Generated_Descriptors -- Legacy-wrapped descriptor factory functions that provide similar functionality via V1 features
Implementation:Evidentlyai_Evidently_Metric_Types -- The metric type system that descriptors integrate with through the report framework

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment