Principle:Evidentlyai Evidently Dataset Creation With Descriptors

Knowledge Sources	Evidently Descriptors Guide Evidently
Domains	NLP, LLM_Evaluation, Feature_Engineering
Last Updated	2026-02-14 12:00 GMT

Overview

A descriptor-enriched dataset creation mechanism that computes row-level text and LLM evaluation features during dataset construction.

Description

Dataset Creation With Descriptors extends basic dataset creation by applying row-level descriptors during the Dataset.from_pandas() call. Descriptors are feature extractors that compute new columns from existing text data, such as:

Text properties: length, sentence count, non-letter character percentage
Sentiment analysis: VADER-based sentiment scores
Pattern matching: regex matches, trigger word presence, out-of-vocabulary percentage
LLM evaluation: negativity detection, decline detection via LLM judges

When descriptors are passed to Dataset.from_pandas(), they are computed immediately and their results are appended as new columns in the dataset. These computed columns can then be referenced by metrics in Reports (e.g., MeanValue("Sentiment"), ValueDrift("text_length")).

Usage

Use this principle when evaluating text data quality or LLM outputs. It is the required approach for the LLM Evaluation Monitoring and Text Data Quality Evaluation workflows. Apply it when you need row-level feature computation before aggregation in reports.

Theoretical Basis

This follows the feature engineering pipeline pattern where raw data is transformed through a series of extractors before evaluation:

# Pseudocode: Descriptor pipeline
descriptors = [TextLength("text"), Sentiment("text"), RegExp("text", r"\d+")]
dataset = create_dataset(df, descriptors=descriptors)
# dataset now has columns: [...original..., "text_length", "Sentiment", "RegExp"]
# These computed columns can be used as metric targets
report = Report([MeanValue("Sentiment"), ValueDrift("text_length")])

The descriptor pattern enables composable evaluation where users define which text properties to measure and the framework handles computation and aggregation.

Related Pages

Implemented By

Implementation:Evidentlyai_Evidently_Dataset_From_Pandas_With_Descriptors

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment