Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Evidentlyai Evidently Text Descriptors

From Leeroopedia
Knowledge Sources
Domains NLP, Text_Analysis
Last Updated 2026-02-14 12:00 GMT

Overview

Concrete descriptor classes for extracting text quality features from text columns provided by the Evidently library.

Description

Evidently provides several built-in text descriptor classes that compute row-level features from text data:

  • TextLength — Character count (from _text_length.py)
  • Sentiment — VADER sentiment score (from generated_descriptors.py)
  • SentenceCount — Sentence count via NLTK (from generated_descriptors.py)
  • OOVWordsPercentage — Out-of-vocabulary word percentage (from generated_descriptors.py)
  • NonLetterCharacterPercentage — Non-letter character percentage (from generated_descriptors.py)
  • RegExp — Regex pattern matching (from generated_descriptors.py)
  • TriggerWordsPresent — Trigger word detection with lemmatization (from generated_descriptors.py)

All descriptors inherit from the Descriptor base class and implement generate_data() to compute their output columns.

Usage

Import descriptor classes from evidently.descriptors and pass instances to Dataset.from_pandas(descriptors=[...]).

Code Reference

Source Location

  • Repository: evidently
  • File: src/evidently/descriptors/_text_length.py (TextLength, L17-35)
  • File: src/evidently/descriptors/generated_descriptors.py (Sentiment L592-605, SentenceCount L576-589, OOVWordsPercentage L463-480, NonLetterCharacterPercentage L445-460, RegExp L534-551, TriggerWordsPresent L608-635)

Signature

class TextLength(Descriptor):
    def __init__(self, column_name: str, alias: Optional[str] = None,
                 tests: Optional[List[AnyDescriptorTest]] = None):
        """Compute text length for each row."""

class Sentiment(Descriptor):
    def __init__(self, column_name: str, alias: Optional[str] = None,
                 tests: Optional[List] = None):
        """Compute VADER sentiment score for each row (-1 to 1)."""

class SentenceCount(Descriptor):
    def __init__(self, column_name: str, alias: Optional[str] = None,
                 tests: Optional[List] = None):
        """Count sentences using NLTK sentence tokenizer."""

class OOVWordsPercentage(Descriptor):
    def __init__(self, column_name: str, ignore_words: Any = (),
                 alias: Optional[str] = None, tests: Optional[List] = None):
        """Compute percentage of out-of-vocabulary words."""

class NonLetterCharacterPercentage(Descriptor):
    def __init__(self, column_name: str, alias: Optional[str] = None,
                 tests: Optional[List] = None):
        """Compute percentage of non-letter characters."""

class RegExp(Descriptor):
    def __init__(self, column_name: str, reg_exp: str,
                 alias: Optional[str] = None, tests: Optional[List] = None):
        """Detect regex pattern matches in text."""

class TriggerWordsPresent(Descriptor):
    def __init__(self, column_name: str, words_list: List[str],
                 lemmatize: bool = True, alias: Optional[str] = None,
                 tests: Optional[List] = None):
        """Detect presence of trigger words with optional lemmatization."""

Import

from evidently.descriptors import (
    TextLength,
    Sentiment,
    SentenceCount,
    OOVWordsPercentage,
    NonLetterCharacterPercentage,
    RegExp,
    TriggerWordsPresent,
)

I/O Contract

Inputs

Descriptor Key Parameters Description
TextLength column_name: str Text column to measure
Sentiment column_name: str Text column for VADER sentiment
SentenceCount column_name: str Text column for sentence counting
OOVWordsPercentage column_name: str, ignore_words: tuple Text column, words to ignore
NonLetterCharacterPercentage column_name: str Text column to analyze
RegExp column_name: str, reg_exp: str Text column, regex pattern
TriggerWordsPresent column_name: str, words_list: List[str], lemmatize: bool Text column, trigger words, lemmatize flag

Outputs

Descriptor Output Type Output Range
TextLength Numerical 0 to max text length
Sentiment Numerical -1.0 to 1.0
SentenceCount Numerical 0 to max sentences
OOVWordsPercentage Numerical 0.0 to 1.0
NonLetterCharacterPercentage Numerical 0.0 to 1.0
RegExp Categorical True/False
TriggerWordsPresent Categorical True/False

Usage Examples

Text Quality Descriptors

from evidently import Dataset, DataDefinition
from evidently.descriptors import (
    TextLength, Sentiment, OOVWordsPercentage,
    RegExp, TriggerWordsPresent, NonLetterCharacterPercentage,
)

descriptors = [
    TextLength("Review_Text"),
    Sentiment("Review_Text"),
    OOVWordsPercentage("Review_Text"),
    NonLetterCharacterPercentage("Review_Text"),
    RegExp("Review_Text", reg_exp=r"\d{4}-\d{2}-\d{2}"),
    TriggerWordsPresent("Review_Text", words_list=["refund", "broken", "scam"]),
]

dataset = Dataset.from_pandas(
    df,
    data_definition=DataDefinition(text_columns=["Review_Text"]),
    descriptors=descriptors,
)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment