Implementation:Mlflow Mlflow Log Assessment

Knowledge Sources	MLflow MLflow Tracing API
Domains	ML_Ops, LLM_Observability
Last Updated	2026-02-13 20:00 GMT

Overview

Concrete tool for attaching human or automated quality assessments and ground truth labels to MLflow traces provided by the MLflow library.

Description

MLflow provides three complementary APIs for logging assessments to traces: mlflow.log_assessment, mlflow.log_feedback, and mlflow.log_expectation.

The mlflow.log_assessment function is the general-purpose entry point. It accepts a trace ID and an Assessment object, which can be either a Feedback (quality evaluation) or an Expectation (ground truth label). The assessment is persisted to the tracking store and associated with the specified trace.

The mlflow.log_feedback function is a convenience wrapper for logging feedback evaluations. It accepts keyword arguments for the feedback name, value, source, rationale, error information, and metadata, and constructs a Feedback assessment internally. Feedback values can be floats, ints, strings, booleans, lists, or dicts. The function also supports logging errors that occurred during evaluation (e.g., LLM judge timeouts) via the error parameter, in which case the value can be omitted. When no source is provided, the default source type is CODE.

The mlflow.log_expectation function is a convenience wrapper for logging ground truth labels. It accepts keyword arguments for the expectation name, value, source, and metadata. The value can be any JSON-serializable object, including structured data like full LLM message dictionaries with tool calls. When no source is provided, the default source type is HUMAN.

All three functions accept an optional span_id parameter (via the assessment object or as a keyword argument) to associate the assessment with a specific span within the trace rather than the trace as a whole.

Usage

Use mlflow.log_feedback for adding quality scores from human reviewers, heuristic functions, or LLM judges. Use mlflow.log_expectation for attaching ground truth labels from annotation campaigns or curated datasets. Use mlflow.log_assessment directly when working with pre-constructed Assessment objects or when building generic evaluation pipelines that handle both types.

Code Reference

Source Location

Repository: mlflow
File: mlflow/tracing/assessment.py
Lines (log_assessment): L30-126
Lines (log_expectation): L129-190
Lines (log_feedback): L251-328

Signature

# General-purpose assessment logging
mlflow.log_assessment(
    trace_id: str,
    assessment: Assessment,
) -> Assessment

# Convenience API for feedback
mlflow.log_feedback(
    *,
    trace_id: str,
    name: str = "feedback",
    value: FeedbackValueType | None = None,
    source: AssessmentSource | None = None,
    error: Exception | AssessmentError | None = None,
    rationale: str | None = None,
    metadata: dict | None = None,
    span_id: str | None = None,
) -> Assessment

# Convenience API for ground truth
mlflow.log_expectation(
    *,
    trace_id: str,
    name: str,
    value: Any,
    source: AssessmentSource | None = None,
    metadata: dict | None = None,
    span_id: str | None = None,
) -> Assessment

Import

import mlflow
from mlflow.entities import (
    Assessment,
    AssessmentSource,
    AssessmentSourceType,
    Feedback,
    Expectation,
    AssessmentError,
)

I/O Contract

Inputs

Name	Type	Required	Description
trace_id	str	Yes	The ID of the trace to attach the assessment to.
assessment	Assessment	Yes (log_assessment)	A Feedback or Expectation assessment object.
name	str	Yes (log_feedback, log_expectation)	The name of the assessment, e.g., "faithfulness" or "expected_answer".
value	FeedbackValueType or Any	Conditional	The assessment value. For feedback: float, int, str, bool, list, or dict. For expectations: any JSON-serializable value. Either value or error required for feedback.
source	AssessmentSource	No	Identifies the assessment origin (HUMAN, LLM_JUDGE, CODE). Defaults to CODE for feedback, HUMAN for expectations.
rationale	str	No	(Feedback only) Justification text explaining the feedback score.
error	Exception or AssessmentError	No	(Feedback only) Error encountered during evaluation, e.g., judge timeout.
metadata	dict	No	Additional key-value metadata for the assessment.
span_id	str	No	Associates the assessment with a specific span within the trace.

Outputs

Name	Type	Description
assessment	Assessment	The created Assessment entity as persisted in the tracking store.

Usage Examples

Basic Usage

import mlflow
from mlflow.entities import Feedback

# Log a feedback score from an LLM judge
feedback = Feedback(
    name="faithfulness",
    value=0.9,
    rationale="The model is faithful to the input.",
    metadata={"model": "gpt-4o-mini"},
)
mlflow.log_assessment(trace_id="tr-abc123", assessment=feedback)

Logging Feedback with Convenience API

import mlflow
from mlflow.entities import AssessmentSource, AssessmentSourceType

mlflow.log_feedback(
    trace_id="tr-abc123",
    name="relevance",
    value=0.95,
    source=AssessmentSource(
        source_type=AssessmentSourceType.LLM_JUDGE,
        source_id="gpt-4",
    ),
    rationale="Response directly addresses the user's question",
)

Logging Ground Truth Expectation

import mlflow
from mlflow.entities import AssessmentSource, AssessmentSourceType

mlflow.log_expectation(
    trace_id="tr-abc123",
    name="expected_answer",
    value="The capital of France is Paris.",
    source=AssessmentSource(
        source_type=AssessmentSourceType.HUMAN,
        source_id="annotator@company.com",
    ),
    metadata={"difficulty": "easy"},
)

Logging an Evaluation Error

import mlflow
from mlflow.entities import AssessmentError, Feedback

error = AssessmentError(
    error_code="RATE_LIMIT_EXCEEDED",
    error_message="Rate limit for the judge exceeded.",
)
feedback = Feedback(name="faithfulness", error=error)
mlflow.log_assessment(trace_id="tr-abc123", assessment=feedback)

Related Pages

Implements Principle

Principle:Mlflow_Mlflow_Trace_Assessment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment