Workflow:Evidentlyai Evidently ML Model Quality Report
| Knowledge Sources | |
|---|---|
| Domains | ML_Ops, Model_Evaluation, Classification, Regression |
| Last Updated | 2026-02-14 10:00 GMT |
Overview
End-to-end process for generating comprehensive ML model quality reports using Evidently presets and metrics, comparing model performance between current and reference datasets, with optional pass/fail test conditions.
Description
This workflow outlines the standard procedure for evaluating ML model quality using Evidently's Report and preset system. It supports both classification models (accuracy, precision, recall, F1, ROC AUC) and regression models (MAE, RMSE, R2, error analysis) through pre-configured presets or individual metrics. The process compares model predictions on current data against a reference baseline, generates interactive HTML reports, and optionally applies automated test conditions for CI/CD integration.
Goal: An interactive report containing model performance metrics, distribution comparisons, and optional pass/fail test results that can be viewed in a notebook, exported as HTML/JSON, or stored in a monitoring workspace.
Scope: From prepared prediction datasets through metric computation to rendered reports with test conditions.
Strategy: Uses Evidently's preset system (ClassificationQuality, RegressionQuality) for comprehensive coverage or individual metrics for targeted evaluation, with automatic test condition generation from reference data.
Usage
Execute this workflow when you need to evaluate a classification or regression model's performance, compare it against a baseline, or validate that model quality meets minimum thresholds before deployment. This applies to experiment tracking, model validation in CI/CD pipelines, and periodic production model audits.
Execution Steps
Step 1: Prepare Prediction Data
Load the dataset containing model predictions alongside ground truth labels. Prepare both a "current" dataset (the data to evaluate) and optionally a "reference" dataset (the baseline to compare against).
Key considerations:
- Both datasets must include target and prediction columns
- For classification, include prediction labels and optionally prediction probabilities
- For regression, include numeric target and prediction columns
- Reference data enables relative comparison and automatic test threshold generation
Step 2: Define Data Schema
Create a DataDefinition specifying column types and task configuration. For classification, define a BinaryClassification or MulticlassClassification with target, prediction, and probability columns. For regression, define a Regression with target and prediction columns.
Key considerations:
- Column type inference is automatic but explicit definitions prevent errors
- Task configurations (classification, regression) determine which metrics are applicable
- Multiple tasks can be defined simultaneously if the dataset supports them
Step 3: Create Evidently Datasets
Wrap both current and reference dataframes as Evidently Dataset objects using the defined schema. This enables typed column access and correct metric computation.
Pseudocode:
current_dataset = Dataset.from_pandas(current_df, data_definition=schema) reference_dataset = Dataset.from_pandas(reference_df, data_definition=schema)
Step 4: Configure Quality Report
Assemble a Report using quality presets (ClassificationQuality, RegressionQuality) for comprehensive metric coverage, or select individual metrics for targeted evaluation. Optionally enable automatic test conditions with the include_tests parameter.
Key considerations:
- Presets include all standard metrics for the task type
- ClassificationQuality covers accuracy, precision, recall, F1, ROC AUC, confusion matrix
- RegressionQuality covers MAE, MAPE, RMSE, R2, error distribution, dummy baseline comparison
- Setting include_tests=True auto-generates pass/fail conditions derived from reference data
- Custom test conditions can be specified using gt(), lt(), eq() and other comparison functions
Step 5: Run Report
Execute the report by providing the current dataset and optionally the reference dataset. The Report.run() method returns a Snapshot (alias Run) containing all computed metric results.
What happens:
- All configured metrics are calculated against the current data
- If reference data is provided, metrics are computed for both and compared
- Test conditions are evaluated and results are included in the snapshot
- The snapshot is a fully serializable result object
Step 6: View and Export Results
Access the report results through multiple output formats. View interactively in Jupyter notebooks, export as HTML for sharing, or extract as JSON/dictionary for programmatic processing.
Key considerations:
- Calling the snapshot in a notebook cell renders an interactive HTML widget
- save_html() produces a standalone HTML file for sharing
- json() and dict() provide structured data for downstream processing
- Test results include pass/fail status for each configured condition