Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlflow Mlflow Convert To Eval Set

From Leeroopedia
Knowledge Sources
Domains ML_Ops, LLM_Evaluation
Last Updated 2026-02-13 20:00 GMT

Overview

Concrete tool for converting heterogeneous evaluation data into a standardised pandas DataFrame provided by the MLflow library.

Description

The _convert_to_eval_set function is the internal entry point that accepts every data format supported by mlflow.genai.evaluate() and returns a normalised pd.DataFrame. It delegates initial coercion to _convert_eval_set_to_df, which handles pd.DataFrame, list[dict], list[Trace], EvaluationDataset entities, and pyspark.sql.DataFrame inputs. After coercion the result passes through three pipeline stages: trace column deserialisation, request/response extraction from traces, and expectation extraction from traces. The function validates that the dataset is non-empty and contains at least one of the inputs or trace columns.

Users do not call this function directly. Instead, they prepare data as a pd.DataFrame or list[dict] following the required schema, and mlflow.genai.evaluate() invokes the converter internally.

Usage

Use this converter implicitly whenever calling mlflow.genai.evaluate(). Understanding the accepted types and required columns is essential for constructing valid evaluation datasets. The converter is also useful as a reference for the canonical evaluation schema when building tooling around MLflow evaluation.

Code Reference

Source Location

  • Repository: mlflow
  • File: mlflow/genai/evaluation/utils.py
  • Lines: L151-162 (main converter), L25-52 (type alias), L100-148 (coercion helper)

Signature

def _convert_to_eval_set(data: "EvaluationDatasetTypes") -> "pd.DataFrame":
    """
    Takes in a dataset in the multiple format that mlflow.genai.evaluate() expects
    and converts it into a standardized Pandas DataFrame.
    """

Import

# User-facing (prepare data manually):
import pandas as pd

# Internal (used by mlflow.genai.evaluate automatically):
from mlflow.genai.evaluation.utils import _convert_to_eval_set

I/O Contract

Inputs

Name Type Required Description
data pd.DataFrame Yes (one of) Pandas DataFrame with evaluation columns.
data list[dict] Yes (one of) List of dictionaries, each representing one evaluation row.
data list[Trace] Yes (one of) List of MLflow Trace objects from which inputs, outputs, and expectations are extracted.
data EvaluationDataset Yes (one of) Managed or entity evaluation dataset object with a to_df() method.
data pyspark.sql.DataFrame Yes (one of) Spark DataFrame, converted to pandas internally.

Required columns in the resulting DataFrame:

Column Type Required Description
inputs dict Yes (unless trace present) Dictionary of input key-value pairs passed to the model.
outputs Any No Model output for the row (optional if predict_fn will generate outputs).
expectations dict No Dictionary of ground-truth values for scorer comparison.
trace Trace No (unless inputs absent) MLflow Trace object for the prediction. If present, inputs and outputs can be derived from it.
tags dict No Dictionary of metadata tags to attach to traces.

Outputs

Name Type Description
result pd.DataFrame Standardised DataFrame with normalised columns. Trace columns are deserialised, and inputs/outputs/expectations are extracted from traces where necessary.

Usage Examples

Basic Usage

import pandas as pd
import mlflow.genai

# Prepare data as a list of dictionaries
data = [
    {
        "inputs": {"question": "What is MLflow?"},
        "outputs": "MLflow is an open-source ML platform.",
        "expectations": {"expected_response": "MLflow is an ML platform."},
    },
    {
        "inputs": {"question": "What is Spark?"},
        "outputs": "Spark is a distributed engine.",
        "expectations": {"expected_response": "Spark is a data processing engine."},
    },
]

# The conversion happens internally when evaluate() is called
result = mlflow.genai.evaluate(
    data=data,
    scorers=[...],
)

Using a Pandas DataFrame

import pandas as pd
import mlflow.genai

df = pd.DataFrame([
    {
        "inputs": {"question": "What is MLflow?"},
        "outputs": "MLflow is an ML platform.",
    },
])

result = mlflow.genai.evaluate(data=df, scorers=[...])

Using Trace Objects

import mlflow
import mlflow.genai

# Retrieve previously recorded traces
traces_df = mlflow.search_traces(model_id="m-abc123")

result = mlflow.genai.evaluate(data=traces_df, scorers=[...])

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment