Implementation:Evidentlyai Evidently Legacy Words Feature
| Knowledge Sources | |
|---|---|
| Domains | NLP, Feature Engineering, Text Analysis |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
This module provides a family of generated feature classes for detecting word-level presence or absence in text columns, with support for lemmatization, multiple matching modes, and both static word lists and row-level word sources.
Description
The module contains a helper function and six classes organized into two hierarchies:
Helper Function:
- _listed_words_present(in_str, mode, lem, words_list, lemmatize) -- Core logic that checks whether words from a list are present in a string. Supports four modes: "includes_any", "includes_all", "excludes_any", and "excludes_all". Words are lowercased and optionally lemmatized before comparison.
Single-column classes (extend ApplyColumnGeneratedFeature):
- WordsPresence -- Base class that applies _listed_words_present to each cell value against a static word list. Produces ColumnType.Categorical output.
- IncludesWords -- Convenience subclass of WordsPresence that automatically prefixes "includes_" to the mode parameter. Supports "any" or "all" shorthand modes.
- ExcludesWords -- Convenience subclass of WordsPresence that automatically prefixes "excludes_" to the mode parameter. Supports "any" or "all" shorthand modes.
Two-column classes (extend GeneratedFeature):
- RowWordPresence -- Checks words from one column against text in another column on a per-row basis. Uses _listed_words_present internally with row-level word lists from a second column.
- WordMatch -- Convenience subclass of RowWordPresence that uses "includes_" mode prefix.
- WordNoMatch -- Convenience subclass of RowWordPresence that uses "excludes_" mode prefix.
All classes use NLTK's WordNetLemmatizer (lazily initialized), strip non-alphanumeric characters via regex, and lowercase words before comparison.
Usage
Use these features for word-level containment checks with NLP-aware processing. Typical use cases include verifying that model outputs include or exclude required keywords, checking for expected vocabulary in generated text, or comparing expected words from a reference column against actual text output.
Code Reference
Source Location
- Repository: Evidentlyai_Evidently
- File:
src/evidently/legacy/features/words_feature.py
Signature
def _listed_words_present(in_str: str, mode: str, lem: WordNetLemmatizer,
words_list: List[str], lemmatize: bool) -> int: ...
class WordsPresence(ApplyColumnGeneratedFeature):
class Config:
type_alias = "evidently:feature:WordsPresence"
__feature_type__: ClassVar = ColumnType.Categorical
column_name: str
words_list: List[str]
mode: str
lemmatize: bool = True
def __init__(self, column_name: str, words_list: List[str],
mode: str = "includes_any", lemmatize: bool = True,
display_name: Optional[str] = None): ...
def apply(self, value: Any) -> bool: ...
class IncludesWords(WordsPresence):
class Config:
type_alias = "evidently:feature:IncludesWords"
def __init__(self, column_name: str, words_list: List[str],
mode: str = "any", lemmatize: bool = True,
display_name: Optional[str] = None): ...
class ExcludesWords(WordsPresence):
class Config:
type_alias = "evidently:feature:ExcludesWords"
def __init__(self, column_name: str, words_list: List[str],
mode: str = "any", lemmatize: bool = True,
display_name: Optional[str] = None): ...
class RowWordPresence(GeneratedFeature):
class Config:
type_alias = "evidently:feature:RowWordPresence"
__feature_type__: ClassVar = ColumnType.Categorical
columns: List[str]
mode: str = "any"
lemmatize: bool = True
def __init__(self, columns: List[str], mode: str, lemmatize: bool,
display_name: Optional[str] = None): ...
def generate_feature(self, data: pd.DataFrame,
data_definition: DataDefinition) -> pd.DataFrame: ...
class WordMatch(RowWordPresence):
class Config:
type_alias = "evidently:feature:WordMatch"
def __init__(self, columns: List[str], mode: str, lemmatize: bool,
display_name: Optional[str] = None): ...
class WordNoMatch(RowWordPresence):
class Config:
type_alias = "evidently:feature:WordNoMatch"
def __init__(self, columns: List[str], mode: str, lemmatize: bool,
display_name: Optional[str] = None): ...
Import
from evidently.legacy.features.words_feature import WordsPresence
from evidently.legacy.features.words_feature import IncludesWords
from evidently.legacy.features.words_feature import ExcludesWords
from evidently.legacy.features.words_feature import RowWordPresence
from evidently.legacy.features.words_feature import WordMatch
from evidently.legacy.features.words_feature import WordNoMatch
I/O Contract
Inputs (WordsPresence / IncludesWords / ExcludesWords)
| Name | Type | Required | Description |
|---|---|---|---|
| column_name | str | Yes | Name of the text column to check |
| words_list | List[str] | Yes | List of words to search for |
| mode | str | No | Matching mode: "includes_any", "includes_all", "excludes_any", "excludes_all" (or shorthand "any"/"all" for subclasses). Default: "includes_any" / "any" |
| lemmatize | bool | No | Whether to lemmatize words before comparison (default: True) |
| display_name | Optional[str] | No | Custom display name for the feature column |
Inputs (RowWordPresence / WordMatch / WordNoMatch)
| Name | Type | Required | Description |
|---|---|---|---|
| columns | List[str] | Yes | Two column names: [text_column, words_column]. The words_column contains word lists for each row. |
| mode | str | Yes | Matching mode (prefixed automatically by subclasses) |
| lemmatize | bool | Yes | Whether to lemmatize words before comparison |
| display_name | Optional[str] | No | Custom display name for the feature column |
Outputs
| Name | Type | Description |
|---|---|---|
| generated feature column | bool | True/False indicating whether the text includes (or excludes) the specified words according to the chosen mode |
Usage Examples
from evidently.legacy.features.words_feature import IncludesWords, ExcludesWords, WordMatch
# Check if the "response" column includes any of the specified words (with lemmatization)
includes_feature = IncludesWords(
column_name="response",
words_list=["thank", "appreciate", "grateful"],
mode="any",
lemmatize=True
)
# Check that "response" excludes all forbidden words
excludes_feature = ExcludesWords(
column_name="response",
words_list=["hate", "terrible"],
mode="all",
lemmatize=True
)
# Row-level word matching: check if words from "expected_keywords" column
# appear in "generated_text" column
word_match_feature = WordMatch(
columns=["generated_text", "expected_keywords"],
mode="any",
lemmatize=True
)