Implementation:Evidentlyai Evidently Dataset From Pandas With Descriptors
| Knowledge Sources | |
|---|---|
| Domains | NLP, LLM_Evaluation, Feature_Engineering |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Concrete factory method for creating descriptor-enriched Evidently Datasets from pandas DataFrames provided by the Evidently library.
Description
Dataset.from_pandas() with the descriptors parameter creates a PandasDataset and immediately computes all descriptor columns. Each descriptor's generate_data() method is called, producing new columns that are appended to the internal DataFrame. The descriptors parameter accepts a list of Descriptor instances (TextLength, Sentiment, LLM judges, etc.).
This is the same Dataset.from_pandas() method as basic dataset creation, but with the descriptors parameter populated. The descriptors are computed eagerly during construction.
Usage
Use this when building datasets for text quality evaluation or LLM output monitoring. Pass descriptors as a list to Dataset.from_pandas() alongside the data and schema.
Code Reference
Source Location
- Repository: evidently
- File: src/evidently/core/datasets.py
- Lines: L1243-1276
Signature
class Dataset:
@classmethod
def from_pandas(
cls,
data: pd.DataFrame,
data_definition: Optional[DataDefinition] = None,
descriptors: Optional[List[Descriptor]] = None,
options: AnyOptions = None,
metadata: Optional[Dict[str, MetadataValueType]] = None,
tags: Optional[List[str]] = None,
) -> "Dataset":
"""
When descriptors are provided, they are computed and appended
as new columns to the dataset during construction.
"""
Import
from evidently import Dataset, DataDefinition
from evidently.descriptors import TextLength, Sentiment, OOVWordsPercentage
# For LLM judges:
from evidently.descriptors import NegativityLLMEval, DeclineLLMEval
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | pd.DataFrame | Yes | Source data with text columns |
| data_definition | Optional[DataDefinition] | No | Schema with text_columns specified |
| descriptors | List[Descriptor] | Yes (for this use case) | List of descriptor instances to compute |
| options | AnyOptions | No | Options for descriptor computation (e.g., LLM provider settings) |
Outputs
| Name | Type | Description |
|---|---|---|
| return value | Dataset | Dataset with original data plus computed descriptor columns |
Usage Examples
Text Quality Descriptors
import pandas as pd
from evidently import Dataset, DataDefinition
from evidently.descriptors import TextLength, Sentiment, OOVWordsPercentage
df = pd.DataFrame({
"review": ["Great product!", "Terrible, do not buy.", "It was okay."]
})
dataset = Dataset.from_pandas(
df,
data_definition=DataDefinition(text_columns=["review"]),
descriptors=[
TextLength("review"),
Sentiment("review"),
OOVWordsPercentage("review"),
],
)
# Dataset now has columns: review, text_length, Sentiment, OOV Words %
LLM Evaluation Descriptors
from evidently import Dataset, DataDefinition
from evidently.descriptors import NegativityLLMEval, DeclineLLMEval, Sentiment
dataset = Dataset.from_pandas(
df,
data_definition=DataDefinition(),
descriptors=[
Sentiment("response"),
NegativityLLMEval("response", provider="openai", model="gpt-4o-mini"),
DeclineLLMEval("response", provider="openai", model="gpt-4o-mini"),
],
)