Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mlflow Mlflow Log Assessment

From Leeroopedia
Knowledge Sources
Domains ML_Ops, LLM_Observability
Last Updated 2026-02-13 20:00 GMT

Overview

Concrete tool for attaching human or automated quality assessments and ground truth labels to MLflow traces provided by the MLflow library.

Description

MLflow provides three complementary APIs for logging assessments to traces: mlflow.log_assessment, mlflow.log_feedback, and mlflow.log_expectation.

The mlflow.log_assessment function is the general-purpose entry point. It accepts a trace ID and an Assessment object, which can be either a Feedback (quality evaluation) or an Expectation (ground truth label). The assessment is persisted to the tracking store and associated with the specified trace.

The mlflow.log_feedback function is a convenience wrapper for logging feedback evaluations. It accepts keyword arguments for the feedback name, value, source, rationale, error information, and metadata, and constructs a Feedback assessment internally. Feedback values can be floats, ints, strings, booleans, lists, or dicts. The function also supports logging errors that occurred during evaluation (e.g., LLM judge timeouts) via the error parameter, in which case the value can be omitted. When no source is provided, the default source type is CODE.

The mlflow.log_expectation function is a convenience wrapper for logging ground truth labels. It accepts keyword arguments for the expectation name, value, source, and metadata. The value can be any JSON-serializable object, including structured data like full LLM message dictionaries with tool calls. When no source is provided, the default source type is HUMAN.

All three functions accept an optional span_id parameter (via the assessment object or as a keyword argument) to associate the assessment with a specific span within the trace rather than the trace as a whole.

Usage

Use mlflow.log_feedback for adding quality scores from human reviewers, heuristic functions, or LLM judges. Use mlflow.log_expectation for attaching ground truth labels from annotation campaigns or curated datasets. Use mlflow.log_assessment directly when working with pre-constructed Assessment objects or when building generic evaluation pipelines that handle both types.

Code Reference

Source Location

  • Repository: mlflow
  • File: mlflow/tracing/assessment.py
  • Lines (log_assessment): L30-126
  • Lines (log_expectation): L129-190
  • Lines (log_feedback): L251-328

Signature

# General-purpose assessment logging
mlflow.log_assessment(
    trace_id: str,
    assessment: Assessment,
) -> Assessment

# Convenience API for feedback
mlflow.log_feedback(
    *,
    trace_id: str,
    name: str = "feedback",
    value: FeedbackValueType | None = None,
    source: AssessmentSource | None = None,
    error: Exception | AssessmentError | None = None,
    rationale: str | None = None,
    metadata: dict | None = None,
    span_id: str | None = None,
) -> Assessment

# Convenience API for ground truth
mlflow.log_expectation(
    *,
    trace_id: str,
    name: str,
    value: Any,
    source: AssessmentSource | None = None,
    metadata: dict | None = None,
    span_id: str | None = None,
) -> Assessment

Import

import mlflow
from mlflow.entities import (
    Assessment,
    AssessmentSource,
    AssessmentSourceType,
    Feedback,
    Expectation,
    AssessmentError,
)

I/O Contract

Inputs

Name Type Required Description
trace_id str Yes The ID of the trace to attach the assessment to.
assessment Assessment Yes (log_assessment) A Feedback or Expectation assessment object.
name str Yes (log_feedback, log_expectation) The name of the assessment, e.g., "faithfulness" or "expected_answer".
value FeedbackValueType or Any Conditional The assessment value. For feedback: float, int, str, bool, list, or dict. For expectations: any JSON-serializable value. Either value or error required for feedback.
source AssessmentSource No Identifies the assessment origin (HUMAN, LLM_JUDGE, CODE). Defaults to CODE for feedback, HUMAN for expectations.
rationale str No (Feedback only) Justification text explaining the feedback score.
error Exception or AssessmentError No (Feedback only) Error encountered during evaluation, e.g., judge timeout.
metadata dict No Additional key-value metadata for the assessment.
span_id str No Associates the assessment with a specific span within the trace.

Outputs

Name Type Description
assessment Assessment The created Assessment entity as persisted in the tracking store.

Usage Examples

Basic Usage

import mlflow
from mlflow.entities import Feedback

# Log a feedback score from an LLM judge
feedback = Feedback(
    name="faithfulness",
    value=0.9,
    rationale="The model is faithful to the input.",
    metadata={"model": "gpt-4o-mini"},
)
mlflow.log_assessment(trace_id="tr-abc123", assessment=feedback)

Logging Feedback with Convenience API

import mlflow
from mlflow.entities import AssessmentSource, AssessmentSourceType

mlflow.log_feedback(
    trace_id="tr-abc123",
    name="relevance",
    value=0.95,
    source=AssessmentSource(
        source_type=AssessmentSourceType.LLM_JUDGE,
        source_id="gpt-4",
    ),
    rationale="Response directly addresses the user's question",
)

Logging Ground Truth Expectation

import mlflow
from mlflow.entities import AssessmentSource, AssessmentSourceType

mlflow.log_expectation(
    trace_id="tr-abc123",
    name="expected_answer",
    value="The capital of France is Paris.",
    source=AssessmentSource(
        source_type=AssessmentSourceType.HUMAN,
        source_id="annotator@company.com",
    ),
    metadata={"difficulty": "easy"},
)

Logging an Evaluation Error

import mlflow
from mlflow.entities import AssessmentError, Feedback

error = AssessmentError(
    error_code="RATE_LIMIT_EXCEEDED",
    error_message="Rate limit for the judge exceeded.",
)
feedback = Feedback(name="faithfulness", error=error)
mlflow.log_assessment(trace_id="tr-abc123", assessment=feedback)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment