Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Evidentlyai Evidently Dataset From Pandas With Descriptors

From Leeroopedia
Knowledge Sources
Domains NLP, LLM_Evaluation, Feature_Engineering
Last Updated 2026-02-14 12:00 GMT

Overview

Concrete factory method for creating descriptor-enriched Evidently Datasets from pandas DataFrames provided by the Evidently library.

Description

Dataset.from_pandas() with the descriptors parameter creates a PandasDataset and immediately computes all descriptor columns. Each descriptor's generate_data() method is called, producing new columns that are appended to the internal DataFrame. The descriptors parameter accepts a list of Descriptor instances (TextLength, Sentiment, LLM judges, etc.).

This is the same Dataset.from_pandas() method as basic dataset creation, but with the descriptors parameter populated. The descriptors are computed eagerly during construction.

Usage

Use this when building datasets for text quality evaluation or LLM output monitoring. Pass descriptors as a list to Dataset.from_pandas() alongside the data and schema.

Code Reference

Source Location

  • Repository: evidently
  • File: src/evidently/core/datasets.py
  • Lines: L1243-1276

Signature

class Dataset:
    @classmethod
    def from_pandas(
        cls,
        data: pd.DataFrame,
        data_definition: Optional[DataDefinition] = None,
        descriptors: Optional[List[Descriptor]] = None,
        options: AnyOptions = None,
        metadata: Optional[Dict[str, MetadataValueType]] = None,
        tags: Optional[List[str]] = None,
    ) -> "Dataset":
        """
        When descriptors are provided, they are computed and appended
        as new columns to the dataset during construction.
        """

Import

from evidently import Dataset, DataDefinition
from evidently.descriptors import TextLength, Sentiment, OOVWordsPercentage
# For LLM judges:
from evidently.descriptors import NegativityLLMEval, DeclineLLMEval

I/O Contract

Inputs

Name Type Required Description
data pd.DataFrame Yes Source data with text columns
data_definition Optional[DataDefinition] No Schema with text_columns specified
descriptors List[Descriptor] Yes (for this use case) List of descriptor instances to compute
options AnyOptions No Options for descriptor computation (e.g., LLM provider settings)

Outputs

Name Type Description
return value Dataset Dataset with original data plus computed descriptor columns

Usage Examples

Text Quality Descriptors

import pandas as pd
from evidently import Dataset, DataDefinition
from evidently.descriptors import TextLength, Sentiment, OOVWordsPercentage

df = pd.DataFrame({
    "review": ["Great product!", "Terrible, do not buy.", "It was okay."]
})

dataset = Dataset.from_pandas(
    df,
    data_definition=DataDefinition(text_columns=["review"]),
    descriptors=[
        TextLength("review"),
        Sentiment("review"),
        OOVWordsPercentage("review"),
    ],
)

# Dataset now has columns: review, text_length, Sentiment, OOV Words %

LLM Evaluation Descriptors

from evidently import Dataset, DataDefinition
from evidently.descriptors import NegativityLLMEval, DeclineLLMEval, Sentiment

dataset = Dataset.from_pandas(
    df,
    data_definition=DataDefinition(),
    descriptors=[
        Sentiment("response"),
        NegativityLLMEval("response", provider="openai", model="gpt-4o-mini"),
        DeclineLLMEval("response", provider="openai", model="gpt-4o-mini"),
    ],
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment