Principle:FlowiseAI Flowise Evaluator Definition

Property	Value
Principle Name	Evaluator_Definition
Overview	Technique for defining evaluation criteria that automatically assess chatflow response quality across multiple dimensions
Domain	AI Evaluation, Quality Metrics, Automated Grading
Source	FlowiseAI/Flowise repository: packages/ui/src/api/evaluators.js, packages/ui/src/views/evaluators/evaluatorConstant.js
Last Updated	2026-02-12 14:00 GMT

Description

Evaluators are configurable scoring rules applied to chatflow outputs. Four types exist:

Text-based evaluators check string properties of the response (contains, starts with, equals).
JSON evaluators validate the structural correctness of the response.
Numeric evaluators check quantitative metrics (token count, latency, response length).
LLM evaluators use AI models to grade responses against custom prompts with structured output schemas.

Each evaluator produces a pass/fail result when applied to a chatflow response. Multiple evaluators can be combined in a single evaluation run to provide comprehensive quality assessment across different dimensions.

Text Evaluators

Text evaluators perform string-based comparisons against the chatflow response:

ContainsAny: Returns true if any of the specified comma-separated values are present in the response.
ContainsAll: Returns true if ALL of the specified comma-separated values are present in the response.
DoesNotContainAny: Returns true if none of the specified comma-separated values are present.
DoesNotContainAll: Returns true if not all of the specified values are present.
StartsWith: Returns true if the response starts with the specified value.
NotStartsWith: Returns true if the response does not start with the specified value.

JSON Evaluators

JSON evaluators validate the response format:

IsValidJSON: Returns true if the response is valid JSON.
IsNotValidJSON: Returns true if the response is not valid JSON.

Numeric Evaluators

Numeric evaluators measure quantitative metrics with comparison operators (equals, notEquals, greaterThan, lessThan, greaterThanOrEquals, lessThanOrEquals):

totalTokens: Sum of prompt tokens and completion tokens.
promptTokens: Number of tokens in the prompt.
completionTokens: Number of tokens generated in the response.
apiLatency: Total time for the Flowise Prediction API call (milliseconds).
llm: Actual LLM invocation time (milliseconds).
chain: Actual time spent executing the chatflow (milliseconds).
responseLength: Number of characters in the response.

LLM Evaluators

LLM evaluators use a separate AI model to grade the chatflow response against a custom prompt. They support structured output schemas, enabling the grading model to return specific fields (e.g., score, reasoning, pass/fail).

Usage

Use evaluator definition when defining automated quality criteria for chatflow evaluation runs. Evaluators should be created before configuring an evaluation run, as the run configuration references evaluator IDs.

Theoretical Basis

This principle follows a multi-dimensional evaluation framework. Each evaluator type addresses a different quality dimension:

Correctness (text matching): Verifies that responses contain expected content or follow expected patterns.
Structure (JSON validation): Ensures responses conform to expected format requirements.
Efficiency (numeric metrics): Measures resource consumption and performance characteristics.
Relevance (LLM grading): Uses AI judgment to assess response quality against subjective criteria.

Combining evaluator types provides comprehensive quality assessment that goes beyond any single metric. This multi-dimensional approach mirrors established software testing practices where different test types (unit, integration, performance) each reveal different categories of defects.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment