Implementation:Evidentlyai Evidently Text Match Descriptor
| Knowledge Sources | |
|---|---|
| Domains | NLP, Text Analysis, Data Quality |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Implements the unified TextMatch descriptor and convenience functions for all text/word matching scenarios, replacing multiple legacy text matching features with a single, configurable API.
Description
The text_match module provides a modern, unified approach to text matching in Evidently. The central class TextMatch is a Descriptor subclass that consolidates various text matching strategies (contains, not_contains, exact, regex) into one configurable component.
Core Classes:
- TextMatchOptions -- Configuration container for text processing behavior:
case_sensitive: Whether matching respects case (default: True).lemmatize: Whether to apply lemmatization before matching (default: False). Requires NLTK.word_boundaries: Whether to extract words using regex word boundaries before matching (default: False).
- TextMatchProcessor -- Centralized processing engine that provides:
process_text(text): Unified text preprocessing pipeline (word boundary extraction, lemmatization).all_match(text, items): Check if text contains ALL specified items.any_match(text, items): Check if text contains ANY specified items.regex_match(text, pattern): Check if text matches a regex pattern.exact_match(text, items): Check if text exactly matches any item.- Lazy-loads NLTK WordNetLemmatizer only when lemmatization is requested.
- TextMatch -- The main descriptor class with the following fields:
column_name: Text column to match against.match_items: Either a list of strings to match, or a column name for column-to-column matching.match_type: One of "contains", "not_contains", "exact", "regex".match_mode: One of "any" (default) or "all".case_sensitive,lemmatize,word_boundaries: Processing options.- Smart defaults: when
lemmatize=True,word_boundariesis automatically enabled. - The
generate_data()method produces a DatasetColumn of type ColumnType.Categorical containing boolean match results.
Convenience Functions:
These functions provide backward-compatible APIs matching legacy feature names, all returning TextMatch instances:
| Function | match_type | match_mode | Special Behavior |
|---|---|---|---|
| Contains() | contains | any/all | Basic substring matching |
| DoesNotContain() | not_contains | inverted mode | Legacy inverted mode logic |
| ItemMatch() | contains | any/all | Column-to-column matching (requires 2 columns) |
| ItemNoMatch() | not_contains | inverted mode | Column-to-column not-contains |
| WordsPresence() | contains/not_contains | varies by mode arg | case_sensitive=False, word_boundaries=True |
| IncludesWords() | contains | any/all | case_sensitive=False, word_boundaries=True, lemmatize=True |
| ExcludesWords() | not_contains | any/all | case_sensitive=False, word_boundaries=True, lemmatize=True |
| WordMatch() | contains | any/all | Column-to-column, word_boundaries=True, lemmatize=True |
| WordNoMatch() | not_contains | any/all | Column-to-column, word_boundaries=True, lemmatize=True |
| TriggerWordsPresent() | contains | any | case_sensitive=False, word_boundaries=True, lemmatize=True |
| RegExp() | regex | N/A | Single regex pattern matching |
Usage
Use this module when:
- Checking if text columns contain or exclude specific words or phrases.
- Performing column-to-column text matching (e.g., comparing response text against keyword columns).
- Running regex pattern matching on text data.
- Needing lemmatization-aware word matching.
- Migrating from legacy text matching features to the V2 API.
Code Reference
Source Location
- Repository: Evidentlyai_Evidently
- File:
src/evidently/descriptors/text_match.py
Signature
class TextMatchOptions:
def __init__(self, case_sensitive: bool = True, lemmatize: bool = False, word_boundaries: bool = False)
class TextMatchProcessor:
def __init__(self, options: TextMatchOptions)
def process_text(self, text: str) -> str
def all_match(self, text: str, items: List[str]) -> bool
def any_match(self, text: str, items: List[str]) -> bool
def regex_match(self, text: str, pattern: str) -> bool
def exact_match(self, text: str, items: List[str]) -> bool
class TextMatch(Descriptor):
column_name: str
match_items: Union[str, List[str]]
match_type: Literal["contains", "not_contains", "exact", "regex"] = "contains"
match_mode: Literal["any", "all"] = "any"
case_sensitive: bool = True
lemmatize: bool = False
word_boundaries: bool = False
def generate_data(self, dataset: Dataset, options: Options) -> DatasetColumn
def list_input_columns(self) -> List[str]
# Convenience functions
def Contains(column_name, items, case_sensitive=True, mode="any", alias=None, tests=None) -> TextMatch
def DoesNotContain(column_name, items, ...) -> TextMatch
def ItemMatch(columns, ...) -> TextMatch
def ItemNoMatch(columns, ...) -> TextMatch
def WordsPresence(column_name, words_list, mode="includes_any", lemmatize=True, ...) -> TextMatch
def IncludesWords(column_name, words_list, mode="any", lemmatize=True, ...) -> TextMatch
def ExcludesWords(column_name, words_list, mode="any", lemmatize=True, ...) -> TextMatch
def WordMatch(columns, mode="any", lemmatize=True, ...) -> TextMatch
def WordNoMatch(columns, mode="any", lemmatize=True, ...) -> TextMatch
def TriggerWordsPresent(column_name, words_list, lemmatize=True, ...) -> TextMatch
def RegExp(column_name, reg_exp, ...) -> TextMatch
Import
from evidently.descriptors.text_match import (
TextMatch,
TextMatchOptions,
TextMatchProcessor,
Contains,
DoesNotContain,
ItemMatch,
ItemNoMatch,
WordsPresence,
IncludesWords,
ExcludesWords,
WordMatch,
WordNoMatch,
TriggerWordsPresent,
RegExp,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| column_name | str | Yes | Name of the text column to match against |
| match_items | Union[str, List[str]] | Yes | List of strings to match, or a column name for column-to-column matching |
| match_type | Literal["contains", "not_contains", "exact", "regex"] | No | Type of matching to perform (default: "contains") |
| match_mode | Literal["any", "all"] | No | Whether to require any or all items to match (default: "any") |
| case_sensitive | bool | No | Whether matching is case-sensitive (default: True) |
| lemmatize | bool | No | Whether to lemmatize words before matching (default: False) |
| word_boundaries | bool | No | Whether to extract words using word boundaries (default: False) |
| alias | Optional[str] | No | Custom display name for the descriptor |
| tests | Optional[List[AnyDescriptorTest]] | No | Tests to apply to descriptor output |
Outputs
| Name | Type | Description |
|---|---|---|
| return | DatasetColumn | Column of type ColumnType.Categorical containing boolean match results (True/False) for each row |
Usage Examples
Basic Contains Check
from evidently.descriptors.text_match import TextMatch
descriptor = TextMatch(
column_name="response",
match_items=["urgent", "important", "critical"],
match_type="contains",
match_mode="any",
case_sensitive=False,
)
Column-to-Column Matching
from evidently.descriptors.text_match import TextMatch
descriptor = TextMatch(
column_name="response",
match_items="expected_keywords", # column name
match_type="contains",
match_mode="all",
)
Exclusion with Lemmatization
from evidently.descriptors.text_match import TextMatch
descriptor = TextMatch(
column_name="description",
match_items=["spam", "test", "ignore"],
match_type="not_contains",
lemmatize=True,
)
Regex Matching
from evidently.descriptors.text_match import RegExp
descriptor = RegExp(
column_name="phone_field",
reg_exp=r"\b\d{3}-\d{3}-\d{4}\b",
alias="Phone Number Format",
)
Using Convenience Functions
from evidently.descriptors.text_match import IncludesWords, ExcludesWords
includes = IncludesWords(
column_name="answer",
words_list=["python", "java", "rust"],
mode="any",
lemmatize=True,
)
excludes = ExcludesWords(
column_name="answer",
words_list=["error", "fail", "crash"],
mode="all",
lemmatize=True,
)
Related Pages
- Environment:Evidentlyai_Evidently_Python_Core_Environment
- Implementation:Evidentlyai_Evidently_Generated_Descriptors -- Legacy-wrapped descriptor factory functions that provide similar functionality via V1 features
- Implementation:Evidentlyai_Evidently_Metric_Types -- The metric type system that descriptors integrate with through the report framework