Implementation:Evidentlyai Evidently Legacy Word Count Feature
| Knowledge Sources | |
|---|---|
| Domains | NLP, Feature Engineering, Text Analysis |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
The WordCount class is a generated feature that counts the number of words in each text value of a specified column, after stripping non-alphabetic characters.
Description
WordCount extends ApplyColumnGeneratedFeature and produces a ColumnType.Numerical output. It uses a compiled class-level regular expression (_reg, pattern [^a-zA-Z ]+) to strip all non-alphabetic, non-space characters from the text, then splits the result on whitespace and returns the count of resulting tokens. For None or NaN values, it returns 0.
The generated column display name follows the template "Word Count for {column_name}".
Usage
Use this feature when you need a simple word count metric for text columns, for example to monitor whether the length of model inputs or outputs changes over time, or to set thresholds on expected text verbosity in data quality checks.
Code Reference
Source Location
- Repository: Evidentlyai_Evidently
- File:
src/evidently/legacy/features/word_count_feature.py
Signature
class WordCount(ApplyColumnGeneratedFeature):
class Config:
type_alias = "evidently:feature:WordCount"
__feature_type__: ClassVar = ColumnType.Numerical
_reg: ClassVar[re.Pattern] = re.compile(r"[^a-zA-Z ]+")
display_name_template: ClassVar = "Word Count for {column_name}"
column_name: str
def __init__(self, column_name: str, display_name: Optional[str] = None): ...
def apply(self, value: Any) -> int: ...
Import
from evidently.legacy.features.word_count_feature import WordCount
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| column_name | str | Yes | Name of the text column to count words in |
| display_name | Optional[str] | No | Custom display name for the generated feature column |
Outputs
| Name | Type | Description |
|---|---|---|
| Word count | int | Number of alphabetic words in the text value. Returns 0 for None or NaN values. |
Usage Examples
from evidently.legacy.features.word_count_feature import WordCount
# Create a word count feature for the "answer" column
word_count_feature = WordCount(column_name="answer")
# With a custom display name
word_count_feature = WordCount(column_name="review", display_name="Review Word Count")