Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:FlowiseAI Flowise CreateEvaluation

From Leeroopedia
Property Value
Implementation Name CreateEvaluation
Implements Principle:FlowiseAI_Flowise_Evaluation_Run_Creation
Source packages/ui/src/api/evaluations.js
Repository FlowiseAI/Flowise
Domain API Client, Evaluation Orchestration
Last Updated 2026-02-12 14:00 GMT

Code Reference

Source Location

The evaluation creation API function is defined in packages/ui/src/api/evaluations.js at line 7.

Signature

// packages/ui/src/api/evaluations.js:L7
const createEvaluation = (body) => client.post(`/evaluations`, body)

The API client is configured at packages/ui/src/api/client.js with a base URL of ${baseURL}/api/v1, making the full endpoint:

  • POST /api/v1/evaluations

Import

import evaluationApi from '@/api/evaluations'

I/O Contract

createEvaluation

Inputs:

  • body (Object):
    • name (string, required): Display name for the evaluation run
    • evaluationType (string, required): One of 'llm' or 'benchmarking'
    • datasetId (string, required): ID of the dataset to evaluate against
    • datasetName (string, required): Display name of the dataset
    • chatflowId (string, required): JSON-encoded array of chatflow IDs to evaluate
    • chatflowName (string, required): JSON-encoded array of chatflow display names
    • chatflowType (string, required): JSON-encoded array of chatflow types
    • selectedSimpleEvaluators (string[], required): Array of simple evaluator IDs (text, JSON, numeric)
    • selectedLLMEvaluators (string[], required): Array of LLM evaluator IDs
    • model (string, optional): Model identifier for LLM-based evaluation
    • llm (string, optional): LLM provider node name
    • credentialId (string, optional): Credential ID for the LLM provider
    • datasetAsOneConversation (boolean, required): Whether to send all rows as a single conversation

Outputs:

  • Promise<{data: EvaluationRun[]}>: Resolves with an array of evaluation run objects, each containing:
    • id (string): Unique identifier for the evaluation run
    • name (string): Display name of the evaluation
    • version (number): Version number of the run
    • status (string): Current status of the evaluation run
    • runDate (string): ISO timestamp of when the run was executed
    • average_metrics (Object): Aggregated metrics across all rows

Usage Examples

Creating a Benchmarking Evaluation

import evaluationApi from '@/api/evaluations'

// Run a benchmarking evaluation with simple evaluators
const response = await evaluationApi.createEvaluation({
    name: 'Support Bot Quality Check v1',
    evaluationType: 'benchmarking',
    datasetId: 'dataset-abc-123',
    datasetName: 'Customer Support QA',
    chatflowId: JSON.stringify(['chatflow-001']),
    chatflowName: JSON.stringify(['Support Bot v2']),
    chatflowType: JSON.stringify(['chatflow']),
    selectedSimpleEvaluators: ['evaluator-text-001', 'evaluator-numeric-002'],
    selectedLLMEvaluators: [],
    datasetAsOneConversation: false
})
const evaluationRuns = response.data

Creating an LLM-Graded Evaluation with Multiple Chatflows

import evaluationApi from '@/api/evaluations'

// Compare two chatflows using LLM-based grading
const response = await evaluationApi.createEvaluation({
    name: 'A/B Test - GPT-4 vs Claude',
    evaluationType: 'llm',
    datasetId: 'dataset-xyz-456',
    datasetName: 'Product FAQ Tests',
    chatflowId: JSON.stringify(['chatflow-gpt4', 'chatflow-claude']),
    chatflowName: JSON.stringify(['GPT-4 Bot', 'Claude Bot']),
    chatflowType: JSON.stringify(['chatflow', 'chatflow']),
    selectedSimpleEvaluators: ['evaluator-latency-001'],
    selectedLLMEvaluators: ['evaluator-llm-relevance'],
    model: 'gpt-4',
    llm: 'chatOpenAI',
    credentialId: 'cred-openai-001',
    datasetAsOneConversation: false
})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment