Workflow:Evidentlyai Evidently ML Model Quality Report

Knowledge Sources	Evidently Evidently Docs Evidently Blog
Domains	ML_Ops, Model_Evaluation, Classification, Regression
Last Updated	2026-02-14 10:00 GMT

Overview

End-to-end process for generating comprehensive ML model quality reports using Evidently presets and metrics, comparing model performance between current and reference datasets, with optional pass/fail test conditions.

Description

This workflow outlines the standard procedure for evaluating ML model quality using Evidently's Report and preset system. It supports both classification models (accuracy, precision, recall, F1, ROC AUC) and regression models (MAE, RMSE, R2, error analysis) through pre-configured presets or individual metrics. The process compares model predictions on current data against a reference baseline, generates interactive HTML reports, and optionally applies automated test conditions for CI/CD integration.

Goal: An interactive report containing model performance metrics, distribution comparisons, and optional pass/fail test results that can be viewed in a notebook, exported as HTML/JSON, or stored in a monitoring workspace.

Scope: From prepared prediction datasets through metric computation to rendered reports with test conditions.

Strategy: Uses Evidently's preset system (ClassificationQuality, RegressionQuality) for comprehensive coverage or individual metrics for targeted evaluation, with automatic test condition generation from reference data.

Usage

Execute this workflow when you need to evaluate a classification or regression model's performance, compare it against a baseline, or validate that model quality meets minimum thresholds before deployment. This applies to experiment tracking, model validation in CI/CD pipelines, and periodic production model audits.

Execution Steps

Step 1: Prepare Prediction Data

Load the dataset containing model predictions alongside ground truth labels. Prepare both a "current" dataset (the data to evaluate) and optionally a "reference" dataset (the baseline to compare against).

Key considerations:

Both datasets must include target and prediction columns
For classification, include prediction labels and optionally prediction probabilities
For regression, include numeric target and prediction columns
Reference data enables relative comparison and automatic test threshold generation

Step 2: Define Data Schema

Create a DataDefinition specifying column types and task configuration. For classification, define a BinaryClassification or MulticlassClassification with target, prediction, and probability columns. For regression, define a Regression with target and prediction columns.

Key considerations:

Column type inference is automatic but explicit definitions prevent errors
Task configurations (classification, regression) determine which metrics are applicable
Multiple tasks can be defined simultaneously if the dataset supports them

Step 3: Create Evidently Datasets

Wrap both current and reference dataframes as Evidently Dataset objects using the defined schema. This enables typed column access and correct metric computation.

Pseudocode:

current_dataset = Dataset.from_pandas(current_df, data_definition=schema)
reference_dataset = Dataset.from_pandas(reference_df, data_definition=schema)

Step 4: Configure Quality Report

Assemble a Report using quality presets (ClassificationQuality, RegressionQuality) for comprehensive metric coverage, or select individual metrics for targeted evaluation. Optionally enable automatic test conditions with the include_tests parameter.

Key considerations:

Presets include all standard metrics for the task type
ClassificationQuality covers accuracy, precision, recall, F1, ROC AUC, confusion matrix
RegressionQuality covers MAE, MAPE, RMSE, R2, error distribution, dummy baseline comparison
Setting include_tests=True auto-generates pass/fail conditions derived from reference data
Custom test conditions can be specified using gt(), lt(), eq() and other comparison functions

Step 5: Run Report

Execute the report by providing the current dataset and optionally the reference dataset. The Report.run() method returns a Snapshot (alias Run) containing all computed metric results.

What happens:

All configured metrics are calculated against the current data
If reference data is provided, metrics are computed for both and compared
Test conditions are evaluated and results are included in the snapshot
The snapshot is a fully serializable result object

Step 6: View and Export Results

Access the report results through multiple output formats. View interactively in Jupyter notebooks, export as HTML for sharing, or extract as JSON/dictionary for programmatic processing.

Key considerations:

Calling the snapshot in a notebook cell renders an interactive HTML widget
save_html() produces a standalone HTML file for sharing
json() and dict() provide structured data for downstream processing
Test results include pass/fail status for each configured condition

Execution Diagram

GitHub URL

Workflow Repository