Implementation:Mlflow Mlflow Log Assessment
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, LLM_Observability |
| Last Updated | 2026-02-13 20:00 GMT |
Overview
Concrete tool for attaching human or automated quality assessments and ground truth labels to MLflow traces provided by the MLflow library.
Description
MLflow provides three complementary APIs for logging assessments to traces: mlflow.log_assessment, mlflow.log_feedback, and mlflow.log_expectation.
The mlflow.log_assessment function is the general-purpose entry point. It accepts a trace ID and an Assessment object, which can be either a Feedback (quality evaluation) or an Expectation (ground truth label). The assessment is persisted to the tracking store and associated with the specified trace.
The mlflow.log_feedback function is a convenience wrapper for logging feedback evaluations. It accepts keyword arguments for the feedback name, value, source, rationale, error information, and metadata, and constructs a Feedback assessment internally. Feedback values can be floats, ints, strings, booleans, lists, or dicts. The function also supports logging errors that occurred during evaluation (e.g., LLM judge timeouts) via the error parameter, in which case the value can be omitted. When no source is provided, the default source type is CODE.
The mlflow.log_expectation function is a convenience wrapper for logging ground truth labels. It accepts keyword arguments for the expectation name, value, source, and metadata. The value can be any JSON-serializable object, including structured data like full LLM message dictionaries with tool calls. When no source is provided, the default source type is HUMAN.
All three functions accept an optional span_id parameter (via the assessment object or as a keyword argument) to associate the assessment with a specific span within the trace rather than the trace as a whole.
Usage
Use mlflow.log_feedback for adding quality scores from human reviewers, heuristic functions, or LLM judges. Use mlflow.log_expectation for attaching ground truth labels from annotation campaigns or curated datasets. Use mlflow.log_assessment directly when working with pre-constructed Assessment objects or when building generic evaluation pipelines that handle both types.
Code Reference
Source Location
- Repository: mlflow
- File:
mlflow/tracing/assessment.py - Lines (log_assessment): L30-126
- Lines (log_expectation): L129-190
- Lines (log_feedback): L251-328
Signature
# General-purpose assessment logging
mlflow.log_assessment(
trace_id: str,
assessment: Assessment,
) -> Assessment
# Convenience API for feedback
mlflow.log_feedback(
*,
trace_id: str,
name: str = "feedback",
value: FeedbackValueType | None = None,
source: AssessmentSource | None = None,
error: Exception | AssessmentError | None = None,
rationale: str | None = None,
metadata: dict | None = None,
span_id: str | None = None,
) -> Assessment
# Convenience API for ground truth
mlflow.log_expectation(
*,
trace_id: str,
name: str,
value: Any,
source: AssessmentSource | None = None,
metadata: dict | None = None,
span_id: str | None = None,
) -> Assessment
Import
import mlflow
from mlflow.entities import (
Assessment,
AssessmentSource,
AssessmentSourceType,
Feedback,
Expectation,
AssessmentError,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| trace_id | str | Yes | The ID of the trace to attach the assessment to. |
| assessment | Assessment | Yes (log_assessment) | A Feedback or Expectation assessment object. |
| name | str | Yes (log_feedback, log_expectation) | The name of the assessment, e.g., "faithfulness" or "expected_answer". |
| value | FeedbackValueType or Any | Conditional | The assessment value. For feedback: float, int, str, bool, list, or dict. For expectations: any JSON-serializable value. Either value or error required for feedback. |
| source | AssessmentSource | No | Identifies the assessment origin (HUMAN, LLM_JUDGE, CODE). Defaults to CODE for feedback, HUMAN for expectations. |
| rationale | str | No | (Feedback only) Justification text explaining the feedback score. |
| error | Exception or AssessmentError | No | (Feedback only) Error encountered during evaluation, e.g., judge timeout. |
| metadata | dict | No | Additional key-value metadata for the assessment. |
| span_id | str | No | Associates the assessment with a specific span within the trace. |
Outputs
| Name | Type | Description |
|---|---|---|
| assessment | Assessment | The created Assessment entity as persisted in the tracking store. |
Usage Examples
Basic Usage
import mlflow
from mlflow.entities import Feedback
# Log a feedback score from an LLM judge
feedback = Feedback(
name="faithfulness",
value=0.9,
rationale="The model is faithful to the input.",
metadata={"model": "gpt-4o-mini"},
)
mlflow.log_assessment(trace_id="tr-abc123", assessment=feedback)
Logging Feedback with Convenience API
import mlflow
from mlflow.entities import AssessmentSource, AssessmentSourceType
mlflow.log_feedback(
trace_id="tr-abc123",
name="relevance",
value=0.95,
source=AssessmentSource(
source_type=AssessmentSourceType.LLM_JUDGE,
source_id="gpt-4",
),
rationale="Response directly addresses the user's question",
)
Logging Ground Truth Expectation
import mlflow
from mlflow.entities import AssessmentSource, AssessmentSourceType
mlflow.log_expectation(
trace_id="tr-abc123",
name="expected_answer",
value="The capital of France is Paris.",
source=AssessmentSource(
source_type=AssessmentSourceType.HUMAN,
source_id="annotator@company.com",
),
metadata={"difficulty": "easy"},
)
Logging an Evaluation Error
import mlflow
from mlflow.entities import AssessmentError, Feedback
error = AssessmentError(
error_code="RATE_LIMIT_EXCEEDED",
error_message="Rate limit for the judge exceeded.",
)
feedback = Feedback(name="faithfulness", error=error)
mlflow.log_assessment(trace_id="tr-abc123", assessment=feedback)