Implementation:Evidentlyai Evidently Text Descriptors
Appearance
| Knowledge Sources | |
|---|---|
| Domains | NLP, Text_Analysis |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Concrete descriptor classes for extracting text quality features from text columns provided by the Evidently library.
Description
Evidently provides several built-in text descriptor classes that compute row-level features from text data:
- TextLength — Character count (from _text_length.py)
- Sentiment — VADER sentiment score (from generated_descriptors.py)
- SentenceCount — Sentence count via NLTK (from generated_descriptors.py)
- OOVWordsPercentage — Out-of-vocabulary word percentage (from generated_descriptors.py)
- NonLetterCharacterPercentage — Non-letter character percentage (from generated_descriptors.py)
- RegExp — Regex pattern matching (from generated_descriptors.py)
- TriggerWordsPresent — Trigger word detection with lemmatization (from generated_descriptors.py)
All descriptors inherit from the Descriptor base class and implement generate_data() to compute their output columns.
Usage
Import descriptor classes from evidently.descriptors and pass instances to Dataset.from_pandas(descriptors=[...]).
Code Reference
Source Location
- Repository: evidently
- File: src/evidently/descriptors/_text_length.py (TextLength, L17-35)
- File: src/evidently/descriptors/generated_descriptors.py (Sentiment L592-605, SentenceCount L576-589, OOVWordsPercentage L463-480, NonLetterCharacterPercentage L445-460, RegExp L534-551, TriggerWordsPresent L608-635)
Signature
class TextLength(Descriptor):
def __init__(self, column_name: str, alias: Optional[str] = None,
tests: Optional[List[AnyDescriptorTest]] = None):
"""Compute text length for each row."""
class Sentiment(Descriptor):
def __init__(self, column_name: str, alias: Optional[str] = None,
tests: Optional[List] = None):
"""Compute VADER sentiment score for each row (-1 to 1)."""
class SentenceCount(Descriptor):
def __init__(self, column_name: str, alias: Optional[str] = None,
tests: Optional[List] = None):
"""Count sentences using NLTK sentence tokenizer."""
class OOVWordsPercentage(Descriptor):
def __init__(self, column_name: str, ignore_words: Any = (),
alias: Optional[str] = None, tests: Optional[List] = None):
"""Compute percentage of out-of-vocabulary words."""
class NonLetterCharacterPercentage(Descriptor):
def __init__(self, column_name: str, alias: Optional[str] = None,
tests: Optional[List] = None):
"""Compute percentage of non-letter characters."""
class RegExp(Descriptor):
def __init__(self, column_name: str, reg_exp: str,
alias: Optional[str] = None, tests: Optional[List] = None):
"""Detect regex pattern matches in text."""
class TriggerWordsPresent(Descriptor):
def __init__(self, column_name: str, words_list: List[str],
lemmatize: bool = True, alias: Optional[str] = None,
tests: Optional[List] = None):
"""Detect presence of trigger words with optional lemmatization."""
Import
from evidently.descriptors import (
TextLength,
Sentiment,
SentenceCount,
OOVWordsPercentage,
NonLetterCharacterPercentage,
RegExp,
TriggerWordsPresent,
)
I/O Contract
Inputs
| Descriptor | Key Parameters | Description |
|---|---|---|
| TextLength | column_name: str | Text column to measure |
| Sentiment | column_name: str | Text column for VADER sentiment |
| SentenceCount | column_name: str | Text column for sentence counting |
| OOVWordsPercentage | column_name: str, ignore_words: tuple | Text column, words to ignore |
| NonLetterCharacterPercentage | column_name: str | Text column to analyze |
| RegExp | column_name: str, reg_exp: str | Text column, regex pattern |
| TriggerWordsPresent | column_name: str, words_list: List[str], lemmatize: bool | Text column, trigger words, lemmatize flag |
Outputs
| Descriptor | Output Type | Output Range |
|---|---|---|
| TextLength | Numerical | 0 to max text length |
| Sentiment | Numerical | -1.0 to 1.0 |
| SentenceCount | Numerical | 0 to max sentences |
| OOVWordsPercentage | Numerical | 0.0 to 1.0 |
| NonLetterCharacterPercentage | Numerical | 0.0 to 1.0 |
| RegExp | Categorical | True/False |
| TriggerWordsPresent | Categorical | True/False |
Usage Examples
Text Quality Descriptors
from evidently import Dataset, DataDefinition
from evidently.descriptors import (
TextLength, Sentiment, OOVWordsPercentage,
RegExp, TriggerWordsPresent, NonLetterCharacterPercentage,
)
descriptors = [
TextLength("Review_Text"),
Sentiment("Review_Text"),
OOVWordsPercentage("Review_Text"),
NonLetterCharacterPercentage("Review_Text"),
RegExp("Review_Text", reg_exp=r"\d{4}-\d{2}-\d{2}"),
TriggerWordsPresent("Review_Text", words_list=["refund", "broken", "scam"]),
]
dataset = Dataset.from_pandas(
df,
data_definition=DataDefinition(text_columns=["Review_Text"]),
descriptors=descriptors,
)
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment