Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Evidentlyai Evidently Text Match Descriptor

From Leeroopedia
Knowledge Sources
Domains NLP, Text Analysis, Data Quality
Last Updated 2026-02-14 12:00 GMT

Overview

Implements the unified TextMatch descriptor and convenience functions for all text/word matching scenarios, replacing multiple legacy text matching features with a single, configurable API.

Description

The text_match module provides a modern, unified approach to text matching in Evidently. The central class TextMatch is a Descriptor subclass that consolidates various text matching strategies (contains, not_contains, exact, regex) into one configurable component.

Core Classes:

  • TextMatchOptions -- Configuration container for text processing behavior:
    • case_sensitive: Whether matching respects case (default: True).
    • lemmatize: Whether to apply lemmatization before matching (default: False). Requires NLTK.
    • word_boundaries: Whether to extract words using regex word boundaries before matching (default: False).
  • TextMatchProcessor -- Centralized processing engine that provides:
    • process_text(text): Unified text preprocessing pipeline (word boundary extraction, lemmatization).
    • all_match(text, items): Check if text contains ALL specified items.
    • any_match(text, items): Check if text contains ANY specified items.
    • regex_match(text, pattern): Check if text matches a regex pattern.
    • exact_match(text, items): Check if text exactly matches any item.
    • Lazy-loads NLTK WordNetLemmatizer only when lemmatization is requested.
  • TextMatch -- The main descriptor class with the following fields:
    • column_name: Text column to match against.
    • match_items: Either a list of strings to match, or a column name for column-to-column matching.
    • match_type: One of "contains", "not_contains", "exact", "regex".
    • match_mode: One of "any" (default) or "all".
    • case_sensitive, lemmatize, word_boundaries: Processing options.
    • Smart defaults: when lemmatize=True, word_boundaries is automatically enabled.
    • The generate_data() method produces a DatasetColumn of type ColumnType.Categorical containing boolean match results.

Convenience Functions:

These functions provide backward-compatible APIs matching legacy feature names, all returning TextMatch instances:

Function match_type match_mode Special Behavior
Contains() contains any/all Basic substring matching
DoesNotContain() not_contains inverted mode Legacy inverted mode logic
ItemMatch() contains any/all Column-to-column matching (requires 2 columns)
ItemNoMatch() not_contains inverted mode Column-to-column not-contains
WordsPresence() contains/not_contains varies by mode arg case_sensitive=False, word_boundaries=True
IncludesWords() contains any/all case_sensitive=False, word_boundaries=True, lemmatize=True
ExcludesWords() not_contains any/all case_sensitive=False, word_boundaries=True, lemmatize=True
WordMatch() contains any/all Column-to-column, word_boundaries=True, lemmatize=True
WordNoMatch() not_contains any/all Column-to-column, word_boundaries=True, lemmatize=True
TriggerWordsPresent() contains any case_sensitive=False, word_boundaries=True, lemmatize=True
RegExp() regex N/A Single regex pattern matching

Usage

Use this module when:

  • Checking if text columns contain or exclude specific words or phrases.
  • Performing column-to-column text matching (e.g., comparing response text against keyword columns).
  • Running regex pattern matching on text data.
  • Needing lemmatization-aware word matching.
  • Migrating from legacy text matching features to the V2 API.

Code Reference

Source Location

Signature

class TextMatchOptions:
    def __init__(self, case_sensitive: bool = True, lemmatize: bool = False, word_boundaries: bool = False)

class TextMatchProcessor:
    def __init__(self, options: TextMatchOptions)
    def process_text(self, text: str) -> str
    def all_match(self, text: str, items: List[str]) -> bool
    def any_match(self, text: str, items: List[str]) -> bool
    def regex_match(self, text: str, pattern: str) -> bool
    def exact_match(self, text: str, items: List[str]) -> bool

class TextMatch(Descriptor):
    column_name: str
    match_items: Union[str, List[str]]
    match_type: Literal["contains", "not_contains", "exact", "regex"] = "contains"
    match_mode: Literal["any", "all"] = "any"
    case_sensitive: bool = True
    lemmatize: bool = False
    word_boundaries: bool = False
    def generate_data(self, dataset: Dataset, options: Options) -> DatasetColumn
    def list_input_columns(self) -> List[str]

# Convenience functions
def Contains(column_name, items, case_sensitive=True, mode="any", alias=None, tests=None) -> TextMatch
def DoesNotContain(column_name, items, ...) -> TextMatch
def ItemMatch(columns, ...) -> TextMatch
def ItemNoMatch(columns, ...) -> TextMatch
def WordsPresence(column_name, words_list, mode="includes_any", lemmatize=True, ...) -> TextMatch
def IncludesWords(column_name, words_list, mode="any", lemmatize=True, ...) -> TextMatch
def ExcludesWords(column_name, words_list, mode="any", lemmatize=True, ...) -> TextMatch
def WordMatch(columns, mode="any", lemmatize=True, ...) -> TextMatch
def WordNoMatch(columns, mode="any", lemmatize=True, ...) -> TextMatch
def TriggerWordsPresent(column_name, words_list, lemmatize=True, ...) -> TextMatch
def RegExp(column_name, reg_exp, ...) -> TextMatch

Import

from evidently.descriptors.text_match import (
    TextMatch,
    TextMatchOptions,
    TextMatchProcessor,
    Contains,
    DoesNotContain,
    ItemMatch,
    ItemNoMatch,
    WordsPresence,
    IncludesWords,
    ExcludesWords,
    WordMatch,
    WordNoMatch,
    TriggerWordsPresent,
    RegExp,
)

I/O Contract

Inputs

Name Type Required Description
column_name str Yes Name of the text column to match against
match_items Union[str, List[str]] Yes List of strings to match, or a column name for column-to-column matching
match_type Literal["contains", "not_contains", "exact", "regex"] No Type of matching to perform (default: "contains")
match_mode Literal["any", "all"] No Whether to require any or all items to match (default: "any")
case_sensitive bool No Whether matching is case-sensitive (default: True)
lemmatize bool No Whether to lemmatize words before matching (default: False)
word_boundaries bool No Whether to extract words using word boundaries (default: False)
alias Optional[str] No Custom display name for the descriptor
tests Optional[List[AnyDescriptorTest]] No Tests to apply to descriptor output

Outputs

Name Type Description
return DatasetColumn Column of type ColumnType.Categorical containing boolean match results (True/False) for each row

Usage Examples

Basic Contains Check

from evidently.descriptors.text_match import TextMatch

descriptor = TextMatch(
    column_name="response",
    match_items=["urgent", "important", "critical"],
    match_type="contains",
    match_mode="any",
    case_sensitive=False,
)

Column-to-Column Matching

from evidently.descriptors.text_match import TextMatch

descriptor = TextMatch(
    column_name="response",
    match_items="expected_keywords",  # column name
    match_type="contains",
    match_mode="all",
)

Exclusion with Lemmatization

from evidently.descriptors.text_match import TextMatch

descriptor = TextMatch(
    column_name="description",
    match_items=["spam", "test", "ignore"],
    match_type="not_contains",
    lemmatize=True,
)

Regex Matching

from evidently.descriptors.text_match import RegExp

descriptor = RegExp(
    column_name="phone_field",
    reg_exp=r"\b\d{3}-\d{3}-\d{4}\b",
    alias="Phone Number Format",
)

Using Convenience Functions

from evidently.descriptors.text_match import IncludesWords, ExcludesWords

includes = IncludesWords(
    column_name="answer",
    words_list=["python", "java", "rust"],
    mode="any",
    lemmatize=True,
)

excludes = ExcludesWords(
    column_name="answer",
    words_list=["error", "fail", "crash"],
    mode="all",
    lemmatize=True,
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment