Principle:FlowiseAI Flowise Evaluation Version Comparison

Property	Value
Principle Name	Evaluation_Version_Comparison
Overview	Technique for comparing evaluation results across multiple versions to track quality improvements over time
Domain	AI Evaluation, Version Management, Trend Analysis
Source	FlowiseAI/Flowise repository: packages/ui/src/api/evaluations.js, packages/ui/src/views/evaluations/EvaluationResultVersionsSideDrawer.jsx
Last Updated	2026-02-12 14:00 GMT

Description

Each evaluation re-run creates a new version. The version comparison system displays a timeline of all versions with their run dates, allowing users to navigate between versions and compare metrics. This enables longitudinal tracking of chatflow quality improvements.

The version comparison workflow:

The user opens the results of an evaluation run.
The user opens the versions side drawer, which displays a chronological timeline of all versions.
Each version entry shows:
- Version number: Sequential identifier (v1, v2, v3, etc.).
- Run date: The timestamp when the version was executed, formatted as DD-MMM-YYYY, hh:mm:ss A.
The user selects a version from the timeline to load its results.
The user can switch between versions to compare metrics, pass/fail rates, and per-row results.

The timeline visualization uses a vertical timeline layout with connected dots, making it easy to trace the sequence of evaluation runs and navigate to any specific version.

Usage

Use evaluation version comparison when comparing chatflow quality across multiple evaluation runs to track improvement trends. This is especially valuable:

After multiple rounds of chatflow tuning to identify which changes had the greatest impact
When reviewing the quality trajectory of a chatflow over time
When deciding whether to deploy a chatflow version based on its evaluation history

Theoretical Basis

This principle follows the version-based comparison pattern. Each version is an immutable snapshot of evaluation results at a point in time. The timeline visualization enables trend analysis across versions.

Key aspects of version-based comparison:

Immutable snapshots: Each version captures the complete evaluation state (inputs, outputs, evaluator results, metrics) at a specific point in time. Past versions are never modified, ensuring reliable historical data.
Temporal ordering: Versions are displayed chronologically, making it intuitive to trace the sequence of modifications and their impact on quality metrics.
Selective navigation: Users can jump to any version in the timeline rather than being limited to sequential comparison. This supports both adjacent comparison (v3 vs v4) and distant comparison (v1 vs v10).
Trend analysis: The version history as a whole reveals quality trends. Consistently improving pass rates across versions indicate effective tuning, while oscillating results suggest unstable modifications.
Decision support: Version comparison provides the empirical basis for deployment decisions. By comparing metrics across versions, teams can identify the highest-quality chatflow configuration with confidence.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment