Implementation:FlowiseAI Flowise CreateEvaluation

Property	Value
Implementation Name	CreateEvaluation
Implements	Principle:FlowiseAI_Flowise_Evaluation_Run_Creation
Source	packages/ui/src/api/evaluations.js
Repository	FlowiseAI/Flowise
Domain	API Client, Evaluation Orchestration
Last Updated	2026-02-12 14:00 GMT

Code Reference

Source Location

The evaluation creation API function is defined in packages/ui/src/api/evaluations.js at line 7.

Signature

// packages/ui/src/api/evaluations.js:L7
const createEvaluation = (body) => client.post(`/evaluations`, body)

The API client is configured at packages/ui/src/api/client.js with a base URL of ${baseURL}/api/v1, making the full endpoint:

POST /api/v1/evaluations

Import

import evaluationApi from '@/api/evaluations'

I/O Contract

createEvaluation

Inputs:

body (Object):
- name (string, required): Display name for the evaluation run
- evaluationType (string, required): One of 'llm' or 'benchmarking'
- datasetId (string, required): ID of the dataset to evaluate against
- datasetName (string, required): Display name of the dataset
- chatflowId (string, required): JSON-encoded array of chatflow IDs to evaluate
- chatflowName (string, required): JSON-encoded array of chatflow display names
- chatflowType (string, required): JSON-encoded array of chatflow types
- selectedSimpleEvaluators (string[], required): Array of simple evaluator IDs (text, JSON, numeric)
- selectedLLMEvaluators (string[], required): Array of LLM evaluator IDs
- model (string, optional): Model identifier for LLM-based evaluation
- llm (string, optional): LLM provider node name
- credentialId (string, optional): Credential ID for the LLM provider
- datasetAsOneConversation (boolean, required): Whether to send all rows as a single conversation

Outputs:

Promise<{data: EvaluationRun[]}>: Resolves with an array of evaluation run objects, each containing:
- id (string): Unique identifier for the evaluation run
- name (string): Display name of the evaluation
- version (number): Version number of the run
- status (string): Current status of the evaluation run
- runDate (string): ISO timestamp of when the run was executed
- average_metrics (Object): Aggregated metrics across all rows

Usage Examples

Creating a Benchmarking Evaluation

import evaluationApi from '@/api/evaluations'

// Run a benchmarking evaluation with simple evaluators
const response = await evaluationApi.createEvaluation({
    name: 'Support Bot Quality Check v1',
    evaluationType: 'benchmarking',
    datasetId: 'dataset-abc-123',
    datasetName: 'Customer Support QA',
    chatflowId: JSON.stringify(['chatflow-001']),
    chatflowName: JSON.stringify(['Support Bot v2']),
    chatflowType: JSON.stringify(['chatflow']),
    selectedSimpleEvaluators: ['evaluator-text-001', 'evaluator-numeric-002'],
    selectedLLMEvaluators: [],
    datasetAsOneConversation: false
})
const evaluationRuns = response.data

Creating an LLM-Graded Evaluation with Multiple Chatflows

import evaluationApi from '@/api/evaluations'

// Compare two chatflows using LLM-based grading
const response = await evaluationApi.createEvaluation({
    name: 'A/B Test - GPT-4 vs Claude',
    evaluationType: 'llm',
    datasetId: 'dataset-xyz-456',
    datasetName: 'Product FAQ Tests',
    chatflowId: JSON.stringify(['chatflow-gpt4', 'chatflow-claude']),
    chatflowName: JSON.stringify(['GPT-4 Bot', 'Claude Bot']),
    chatflowType: JSON.stringify(['chatflow', 'chatflow']),
    selectedSimpleEvaluators: ['evaluator-latency-001'],
    selectedLLMEvaluators: ['evaluator-llm-relevance'],
    model: 'gpt-4',
    llm: 'chatOpenAI',
    credentialId: 'cred-openai-001',
    datasetAsOneConversation: false
})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment