Implementation:Mlflow Mlflow Convert To Eval Set
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, LLM_Evaluation |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Concrete tool for converting heterogeneous evaluation data into a standardised pandas DataFrame provided by the MLflow library.
Description
The _convert_to_eval_set function is the internal entry point that accepts every data format supported by mlflow.genai.evaluate() and returns a normalised pd.DataFrame. It delegates initial coercion to _convert_eval_set_to_df, which handles pd.DataFrame, list[dict], list[Trace], EvaluationDataset entities, and pyspark.sql.DataFrame inputs. After coercion the result passes through three pipeline stages: trace column deserialisation, request/response extraction from traces, and expectation extraction from traces. The function validates that the dataset is non-empty and contains at least one of the inputs or trace columns.
Users do not call this function directly. Instead, they prepare data as a pd.DataFrame or list[dict] following the required schema, and mlflow.genai.evaluate() invokes the converter internally.
Usage
Use this converter implicitly whenever calling mlflow.genai.evaluate(). Understanding the accepted types and required columns is essential for constructing valid evaluation datasets. The converter is also useful as a reference for the canonical evaluation schema when building tooling around MLflow evaluation.
Code Reference
Source Location
- Repository: mlflow
- File:
mlflow/genai/evaluation/utils.py - Lines: L151-162 (main converter), L25-52 (type alias), L100-148 (coercion helper)
Signature
def _convert_to_eval_set(data: "EvaluationDatasetTypes") -> "pd.DataFrame":
"""
Takes in a dataset in the multiple format that mlflow.genai.evaluate() expects
and converts it into a standardized Pandas DataFrame.
"""
Import
# User-facing (prepare data manually):
import pandas as pd
# Internal (used by mlflow.genai.evaluate automatically):
from mlflow.genai.evaluation.utils import _convert_to_eval_set
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | pd.DataFrame |
Yes (one of) | Pandas DataFrame with evaluation columns. |
| data | list[dict] |
Yes (one of) | List of dictionaries, each representing one evaluation row. |
| data | list[Trace] |
Yes (one of) | List of MLflow Trace objects from which inputs, outputs, and expectations are extracted. |
| data | EvaluationDataset |
Yes (one of) | Managed or entity evaluation dataset object with a to_df() method.
|
| data | pyspark.sql.DataFrame |
Yes (one of) | Spark DataFrame, converted to pandas internally. |
Required columns in the resulting DataFrame:
| Column | Type | Required | Description |
|---|---|---|---|
| inputs | dict |
Yes (unless trace present) | Dictionary of input key-value pairs passed to the model. |
| outputs | Any | No | Model output for the row (optional if predict_fn will generate outputs).
|
| expectations | dict |
No | Dictionary of ground-truth values for scorer comparison. |
| trace | Trace |
No (unless inputs absent) | MLflow Trace object for the prediction. If present, inputs and outputs can be derived from it. |
| tags | dict |
No | Dictionary of metadata tags to attach to traces. |
Outputs
| Name | Type | Description |
|---|---|---|
| result | pd.DataFrame |
Standardised DataFrame with normalised columns. Trace columns are deserialised, and inputs/outputs/expectations are extracted from traces where necessary. |
Usage Examples
Basic Usage
import pandas as pd
import mlflow.genai
# Prepare data as a list of dictionaries
data = [
{
"inputs": {"question": "What is MLflow?"},
"outputs": "MLflow is an open-source ML platform.",
"expectations": {"expected_response": "MLflow is an ML platform."},
},
{
"inputs": {"question": "What is Spark?"},
"outputs": "Spark is a distributed engine.",
"expectations": {"expected_response": "Spark is a data processing engine."},
},
]
# The conversion happens internally when evaluate() is called
result = mlflow.genai.evaluate(
data=data,
scorers=[...],
)
Using a Pandas DataFrame
import pandas as pd
import mlflow.genai
df = pd.DataFrame([
{
"inputs": {"question": "What is MLflow?"},
"outputs": "MLflow is an ML platform.",
},
])
result = mlflow.genai.evaluate(data=df, scorers=[...])
Using Trace Objects
import mlflow
import mlflow.genai
# Retrieve previously recorded traces
traces_df = mlflow.search_traces(model_id="m-abc123")
result = mlflow.genai.evaluate(data=traces_df, scorers=[...])