Principle:FlowiseAI Flowise Evaluator Definition
| Property | Value |
|---|---|
| Principle Name | Evaluator_Definition |
| Overview | Technique for defining evaluation criteria that automatically assess chatflow response quality across multiple dimensions |
| Domain | AI Evaluation, Quality Metrics, Automated Grading |
| Source | FlowiseAI/Flowise repository: packages/ui/src/api/evaluators.js, packages/ui/src/views/evaluators/evaluatorConstant.js |
| Last Updated | 2026-02-12 14:00 GMT |
Description
Evaluators are configurable scoring rules applied to chatflow outputs. Four types exist:
- Text-based evaluators check string properties of the response (contains, starts with, equals).
- JSON evaluators validate the structural correctness of the response.
- Numeric evaluators check quantitative metrics (token count, latency, response length).
- LLM evaluators use AI models to grade responses against custom prompts with structured output schemas.
Each evaluator produces a pass/fail result when applied to a chatflow response. Multiple evaluators can be combined in a single evaluation run to provide comprehensive quality assessment across different dimensions.
Text Evaluators
Text evaluators perform string-based comparisons against the chatflow response:
- ContainsAny: Returns true if any of the specified comma-separated values are present in the response.
- ContainsAll: Returns true if ALL of the specified comma-separated values are present in the response.
- DoesNotContainAny: Returns true if none of the specified comma-separated values are present.
- DoesNotContainAll: Returns true if not all of the specified values are present.
- StartsWith: Returns true if the response starts with the specified value.
- NotStartsWith: Returns true if the response does not start with the specified value.
JSON Evaluators
JSON evaluators validate the response format:
- IsValidJSON: Returns true if the response is valid JSON.
- IsNotValidJSON: Returns true if the response is not valid JSON.
Numeric Evaluators
Numeric evaluators measure quantitative metrics with comparison operators (equals, notEquals, greaterThan, lessThan, greaterThanOrEquals, lessThanOrEquals):
- totalTokens: Sum of prompt tokens and completion tokens.
- promptTokens: Number of tokens in the prompt.
- completionTokens: Number of tokens generated in the response.
- apiLatency: Total time for the Flowise Prediction API call (milliseconds).
- llm: Actual LLM invocation time (milliseconds).
- chain: Actual time spent executing the chatflow (milliseconds).
- responseLength: Number of characters in the response.
LLM Evaluators
LLM evaluators use a separate AI model to grade the chatflow response against a custom prompt. They support structured output schemas, enabling the grading model to return specific fields (e.g., score, reasoning, pass/fail).
Usage
Use evaluator definition when defining automated quality criteria for chatflow evaluation runs. Evaluators should be created before configuring an evaluation run, as the run configuration references evaluator IDs.
Theoretical Basis
This principle follows a multi-dimensional evaluation framework. Each evaluator type addresses a different quality dimension:
- Correctness (text matching): Verifies that responses contain expected content or follow expected patterns.
- Structure (JSON validation): Ensures responses conform to expected format requirements.
- Efficiency (numeric metrics): Measures resource consumption and performance characteristics.
- Relevance (LLM grading): Uses AI judgment to assess response quality against subjective criteria.
Combining evaluator types provides comprehensive quality assessment that goes beyond any single metric. This multi-dimensional approach mirrors established software testing practices where different test types (unit, integration, performance) each reveal different categories of defects.