Implementation:Evidentlyai Evidently Legacy Words Feature

Knowledge Sources	Evidentlyai_Evidently
Domains	NLP, Feature Engineering, Text Analysis
Last Updated	2026-02-14 12:00 GMT

Overview

This module provides a family of generated feature classes for detecting word-level presence or absence in text columns, with support for lemmatization, multiple matching modes, and both static word lists and row-level word sources.

Description

The module contains a helper function and six classes organized into two hierarchies:

Helper Function:

_listed_words_present(in_str, mode, lem, words_list, lemmatize) -- Core logic that checks whether words from a list are present in a string. Supports four modes: "includes_any", "includes_all", "excludes_any", and "excludes_all". Words are lowercased and optionally lemmatized before comparison.

Single-column classes (extend ApplyColumnGeneratedFeature):

WordsPresence -- Base class that applies _listed_words_present to each cell value against a static word list. Produces ColumnType.Categorical output.
IncludesWords -- Convenience subclass of WordsPresence that automatically prefixes "includes_" to the mode parameter. Supports "any" or "all" shorthand modes.
ExcludesWords -- Convenience subclass of WordsPresence that automatically prefixes "excludes_" to the mode parameter. Supports "any" or "all" shorthand modes.

Two-column classes (extend GeneratedFeature):

RowWordPresence -- Checks words from one column against text in another column on a per-row basis. Uses _listed_words_present internally with row-level word lists from a second column.
WordMatch -- Convenience subclass of RowWordPresence that uses "includes_" mode prefix.
WordNoMatch -- Convenience subclass of RowWordPresence that uses "excludes_" mode prefix.

All classes use NLTK's WordNetLemmatizer (lazily initialized), strip non-alphanumeric characters via regex, and lowercase words before comparison.

Usage

Use these features for word-level containment checks with NLP-aware processing. Typical use cases include verifying that model outputs include or exclude required keywords, checking for expected vocabulary in generated text, or comparing expected words from a reference column against actual text output.

Code Reference

Source Location

Repository: Evidentlyai_Evidently
File: src/evidently/legacy/features/words_feature.py

Signature

def _listed_words_present(in_str: str, mode: str, lem: WordNetLemmatizer,
                          words_list: List[str], lemmatize: bool) -> int: ...

class WordsPresence(ApplyColumnGeneratedFeature):
    class Config:
        type_alias = "evidently:feature:WordsPresence"
    __feature_type__: ClassVar = ColumnType.Categorical
    column_name: str
    words_list: List[str]
    mode: str
    lemmatize: bool = True
    def __init__(self, column_name: str, words_list: List[str],
                 mode: str = "includes_any", lemmatize: bool = True,
                 display_name: Optional[str] = None): ...
    def apply(self, value: Any) -> bool: ...

class IncludesWords(WordsPresence):
    class Config:
        type_alias = "evidently:feature:IncludesWords"
    def __init__(self, column_name: str, words_list: List[str],
                 mode: str = "any", lemmatize: bool = True,
                 display_name: Optional[str] = None): ...

class ExcludesWords(WordsPresence):
    class Config:
        type_alias = "evidently:feature:ExcludesWords"
    def __init__(self, column_name: str, words_list: List[str],
                 mode: str = "any", lemmatize: bool = True,
                 display_name: Optional[str] = None): ...

class RowWordPresence(GeneratedFeature):
    class Config:
        type_alias = "evidently:feature:RowWordPresence"
    __feature_type__: ClassVar = ColumnType.Categorical
    columns: List[str]
    mode: str = "any"
    lemmatize: bool = True
    def __init__(self, columns: List[str], mode: str, lemmatize: bool,
                 display_name: Optional[str] = None): ...
    def generate_feature(self, data: pd.DataFrame,
                         data_definition: DataDefinition) -> pd.DataFrame: ...

class WordMatch(RowWordPresence):
    class Config:
        type_alias = "evidently:feature:WordMatch"
    def __init__(self, columns: List[str], mode: str, lemmatize: bool,
                 display_name: Optional[str] = None): ...

class WordNoMatch(RowWordPresence):
    class Config:
        type_alias = "evidently:feature:WordNoMatch"
    def __init__(self, columns: List[str], mode: str, lemmatize: bool,
                 display_name: Optional[str] = None): ...

Import

from evidently.legacy.features.words_feature import WordsPresence
from evidently.legacy.features.words_feature import IncludesWords
from evidently.legacy.features.words_feature import ExcludesWords
from evidently.legacy.features.words_feature import RowWordPresence
from evidently.legacy.features.words_feature import WordMatch
from evidently.legacy.features.words_feature import WordNoMatch

I/O Contract

Inputs (WordsPresence / IncludesWords / ExcludesWords)

Name	Type	Required	Description
column_name	str	Yes	Name of the text column to check
words_list	List[str]	Yes	List of words to search for
mode	str	No	Matching mode: "includes_any", "includes_all", "excludes_any", "excludes_all" (or shorthand "any"/"all" for subclasses). Default: "includes_any" / "any"
lemmatize	bool	No	Whether to lemmatize words before comparison (default: True)
display_name	Optional[str]	No	Custom display name for the feature column

Inputs (RowWordPresence / WordMatch / WordNoMatch)

Name	Type	Required	Description
columns	List[str]	Yes	Two column names: [text_column, words_column]. The words_column contains word lists for each row.
mode	str	Yes	Matching mode (prefixed automatically by subclasses)
lemmatize	bool	Yes	Whether to lemmatize words before comparison
display_name	Optional[str]	No	Custom display name for the feature column

Outputs

Name	Type	Description
generated feature column	bool	True/False indicating whether the text includes (or excludes) the specified words according to the chosen mode

Usage Examples

from evidently.legacy.features.words_feature import IncludesWords, ExcludesWords, WordMatch

# Check if the "response" column includes any of the specified words (with lemmatization)
includes_feature = IncludesWords(
    column_name="response",
    words_list=["thank", "appreciate", "grateful"],
    mode="any",
    lemmatize=True
)

# Check that "response" excludes all forbidden words
excludes_feature = ExcludesWords(
    column_name="response",
    words_list=["hate", "terrible"],
    mode="all",
    lemmatize=True
)

# Row-level word matching: check if words from "expected_keywords" column
# appear in "generated_text" column
word_match_feature = WordMatch(
    columns=["generated_text", "expected_keywords"],
    mode="any",
    lemmatize=True
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment