Implementation:Mlflow Mlflow Convert To Eval Set

Knowledge Sources	MLflow MLflow GenAI API
Domains	ML_Ops, LLM_Evaluation
Last Updated	2026-02-13 20:00 GMT

Overview

Concrete tool for converting heterogeneous evaluation data into a standardised pandas DataFrame provided by the MLflow library.

Description

The _convert_to_eval_set function is the internal entry point that accepts every data format supported by mlflow.genai.evaluate() and returns a normalised pd.DataFrame. It delegates initial coercion to _convert_eval_set_to_df, which handles pd.DataFrame, list[dict], list[Trace], EvaluationDataset entities, and pyspark.sql.DataFrame inputs. After coercion the result passes through three pipeline stages: trace column deserialisation, request/response extraction from traces, and expectation extraction from traces. The function validates that the dataset is non-empty and contains at least one of the inputs or trace columns.

Users do not call this function directly. Instead, they prepare data as a pd.DataFrame or list[dict] following the required schema, and mlflow.genai.evaluate() invokes the converter internally.

Usage

Use this converter implicitly whenever calling mlflow.genai.evaluate(). Understanding the accepted types and required columns is essential for constructing valid evaluation datasets. The converter is also useful as a reference for the canonical evaluation schema when building tooling around MLflow evaluation.

Code Reference

Source Location

Repository: mlflow
File: mlflow/genai/evaluation/utils.py
Lines: L151-162 (main converter), L25-52 (type alias), L100-148 (coercion helper)

Signature

def _convert_to_eval_set(data: "EvaluationDatasetTypes") -> "pd.DataFrame":
    """
    Takes in a dataset in the multiple format that mlflow.genai.evaluate() expects
    and converts it into a standardized Pandas DataFrame.
    """

Import

# User-facing (prepare data manually):
import pandas as pd

# Internal (used by mlflow.genai.evaluate automatically):
from mlflow.genai.evaluation.utils import _convert_to_eval_set

I/O Contract

Inputs

Name	Type	Required	Description
data	`pd.DataFrame`	Yes (one of)	Pandas DataFrame with evaluation columns.
data	`list[dict]`	Yes (one of)	List of dictionaries, each representing one evaluation row.
data	`list[Trace]`	Yes (one of)	List of MLflow Trace objects from which inputs, outputs, and expectations are extracted.
data	`EvaluationDataset`	Yes (one of)	Managed or entity evaluation dataset object with a `to_df()` method.
data	`pyspark.sql.DataFrame`	Yes (one of)	Spark DataFrame, converted to pandas internally.

Required columns in the resulting DataFrame:

Column	Type	Required	Description
inputs	`dict`	Yes (unless trace present)	Dictionary of input key-value pairs passed to the model.
outputs	Any	No	Model output for the row (optional if `predict_fn` will generate outputs).
expectations	`dict`	No	Dictionary of ground-truth values for scorer comparison.
trace	`Trace`	No (unless inputs absent)	MLflow Trace object for the prediction. If present, inputs and outputs can be derived from it.
tags	`dict`	No	Dictionary of metadata tags to attach to traces.

Outputs

Name	Type	Description
result	`pd.DataFrame`	Standardised DataFrame with normalised columns. Trace columns are deserialised, and inputs/outputs/expectations are extracted from traces where necessary.

Usage Examples

Basic Usage

import pandas as pd
import mlflow.genai

# Prepare data as a list of dictionaries
data = [
    {
        "inputs": {"question": "What is MLflow?"},
        "outputs": "MLflow is an open-source ML platform.",
        "expectations": {"expected_response": "MLflow is an ML platform."},
    },
    {
        "inputs": {"question": "What is Spark?"},
        "outputs": "Spark is a distributed engine.",
        "expectations": {"expected_response": "Spark is a data processing engine."},
    },
]

# The conversion happens internally when evaluate() is called
result = mlflow.genai.evaluate(
    data=data,
    scorers=[...],
)

Using a Pandas DataFrame

import pandas as pd
import mlflow.genai

df = pd.DataFrame([
    {
        "inputs": {"question": "What is MLflow?"},
        "outputs": "MLflow is an ML platform.",
    },
])

result = mlflow.genai.evaluate(data=df, scorers=[...])

Using Trace Objects

import mlflow
import mlflow.genai

# Retrieve previously recorded traces
traces_df = mlflow.search_traces(model_id="m-abc123")

result = mlflow.genai.evaluate(data=traces_df, scorers=[...])

Related Pages

Implements Principle

Principle:Mlflow_Mlflow_Evaluation_Dataset_Preparation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment