Principle:FlowiseAI Flowise Evaluation Rerun
| Property | Value |
|---|---|
| Principle Name | Evaluation_Rerun |
| Overview | Technique for re-executing a previous evaluation run to measure improvement after chatflow modifications |
| Domain | AI Evaluation, Iterative Improvement, Regression Testing |
| Source | FlowiseAI/Flowise repository: packages/ui/src/api/evaluations.js |
| Last Updated | 2026-02-12 14:00 GMT |
Description
After modifying a chatflow (changing prompts, swapping models, adjusting parameters), users re-run the same evaluation to compare results. The re-run creates a new version of the evaluation using the same dataset and evaluators but tests against the updated chatflow. This enables iterative improvement tracking.
The re-run process:
- The user identifies an existing evaluation run that was previously executed.
- The user modifies the target chatflow (e.g., refines system prompt, changes model, adjusts temperature).
- The user triggers a re-run of the evaluation, which:
- Uses the same dataset (identical input/expected-output pairs).
- Uses the same evaluators (identical scoring criteria).
- Tests against the updated chatflow (reflecting recent modifications).
- Creates a new version of the evaluation results.
- The new version's results can be compared against previous versions to measure improvement or detect regressions.
This cycle can be repeated as many times as needed, building a version history that tracks the chatflow's quality trajectory over time.
Usage
Use evaluation re-run when re-testing a chatflow after modifications to measure quality changes. This is a critical step in the iterative development workflow:
- After adjusting chatflow parameters (prompts, models, tools)
- After fixing issues identified in previous evaluation results
- As part of a continuous improvement process for chatflow quality
Theoretical Basis
This principle follows the iterative improvement cycle pattern. Each re-run creates a versioned snapshot of evaluation results, enabling before/after comparison. Same inputs ensure consistent testing conditions across versions.
Key aspects of the iterative improvement model:
- Controlled variables: By reusing the same dataset and evaluators, the only variable that changes between versions is the chatflow itself. This isolates the impact of chatflow modifications on quality metrics.
- Version immutability: Each evaluation version is an immutable snapshot. Previous results are never overwritten, ensuring a complete audit trail of quality changes.
- Regression detection: Comparing new results against previous versions reveals not only improvements but also regressions, where changes intended to fix one issue inadvertently degrade another dimension.
- Convergence tracking: Over multiple re-runs, the version history reveals whether iterative modifications are converging toward better quality or oscillating without clear improvement.