Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Evidentlyai Evidently Text Descriptor Configuration

From Leeroopedia
Knowledge Sources
Domains NLP, Text_Analysis, Feature_Engineering
Last Updated 2026-02-14 12:00 GMT

Overview

A row-level text feature extraction mechanism that computes quantitative properties from text columns.

Description

Text Descriptor Configuration defines which text properties to compute on each row of a text column. Evidently provides built-in descriptors for common text quality metrics:

  • TextLength: Character count of text
  • SentenceCount: Number of sentences (NLTK-based)
  • Sentiment: VADER sentiment score (-1 to 1)
  • OOVWordsPercentage: Percentage of out-of-vocabulary words
  • NonLetterCharacterPercentage: Percentage of non-alphabetic characters
  • RegExp: Regex pattern match detection
  • TriggerWordsPresent: Presence of specified trigger words (with optional lemmatization)

Each descriptor takes a column_name (the text column to analyze) and an optional alias (the output column name). When passed to Dataset.from_pandas(), descriptors produce new columns that can be referenced by metrics in Reports.

Usage

Use this principle when evaluating text data quality. Configure descriptors for the text properties relevant to your use case, then pass them to Dataset.from_pandas() for computation.

Theoretical Basis

Text descriptors implement the feature extractor pattern where raw text is transformed into numerical or categorical features:

# Pseudocode: Text feature extraction
for row in dataset:
    text = row[column_name]
    row["text_length"] = len(text)
    row["sentiment"] = vader_sentiment(text)
    row["oov_pct"] = count_oov(text) / count_words(text)
    row["has_pattern"] = bool(re.search(pattern, text))

This transforms unstructured text into structured features amenable to statistical analysis and drift detection.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment