Implementation:Langfuse Langfuse Seeder Orchestrator
| Knowledge Sources | |
|---|---|
| Domains | Database Seeding, ClickHouse, Test Data Generation |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
The SeederOrchestrator class coordinates all ClickHouse seeding operations, orchestrating the generation and insertion of dataset experiments, evaluation data, synthetic traces, support chat sessions, framework traces, and media test traces.
Description
The SeederOrchestrator is the top-level coordinator for populating ClickHouse with development seed data. It brings together the DataGenerator (for creating record objects), the ClickHouseQueryBuilder (for executing inserts), and the FrameworkTraceLoader (for loading framework-specific trace files).
Upon construction, the orchestrator:
- Instantiates the
DataGeneratorsingleton. - Creates a
ClickHouseQueryBuilderinstance. - Loads external file content from
nested_json.json,markdown.txt, andchat_ml_json.json, truncating large content (first 3 products, first 4 messages) for reasonable seed data sizes. - Passes the loaded file content to the
DataGenerator.
The orchestrator exposes five public methods for creating different types of seed data:
createDatasetExperimentData(projectIds, opts):
- Iterates over projects and dataset runs (configurable via
opts.numberOfRuns). - For each dataset with
shouldRunExperiment: true, generates traces, observations, dataset run items, and scores. - Inserts all records into ClickHouse in sequence (traces -> observations -> dataset run items -> scores).
createEvaluationData(projectIds):
- Generates 100 evaluation traces per project with 10 observations each.
- Creates one score per evaluation trace (skipping every 10th to simulate failures).
createSyntheticData(projectIds, opts):
- Supports two modes: "bulk" (using raw SQL generation for high-volume inserts) and standard (using record-by-record generation).
- Generates traces with 15 observations each and 10 scores per trace.
- The total observation count is determined by the
opts.modesetting.
createSupportChatSessionTraces(projectIds):
- Creates realistic multi-turn support chat conversation data.
createFrameworkTraces(projectIds):
- Loads real trace data from the 9 framework JSON files (Agno, BeeAI, Google ADK, Koog, LangGraph, Microsoft Agent, OpenAI Agents, OpenAI Assistants, Pydantic AI).
- Inserts the loaded traces, observations, and scores into ClickHouse.
createMediaTestTraces(projectIds):
- Creates 3 test traces for media attachment testing: image-only, all media types, and all types with ChatML format.
executeFullSeed(projectIds, opts):
- Runs all seeding operations in sequence: dataset experiments, evaluation data, synthetic data, support chat sessions, framework traces, and media test traces.
- Logs per-project statistics after completion.
The orchestrator also includes a logStatistics() method that queries ClickHouse for per-project counts across traces, scores, and observations tables and displays bar chart representations.
Usage
Use this class when:
- Running the full ClickHouse seeding process for development.
- Creating specific types of seed data in isolation (datasets only, evaluation only, etc.).
- Extending the seeding pipeline with new data types or sources.
Code Reference
Source Location
- Repository: Langfuse
- File: packages/shared/scripts/seeder/utils/seeder-orchestrator.ts
- Lines: 1-558
Signature
export class SeederOrchestrator {
private dataGenerator: DataGenerator;
private queryBuilder: ClickHouseQueryBuilder;
private fileContent: FileContent | null;
constructor();
// Dataset experiment data (langfuse-prompt-experiment environment)
async createDatasetExperimentData(projectIds: string[], opts: SeederOptions): Promise<void>;
// Evaluation data (langfuse-evaluation environment)
async createEvaluationData(projectIds: string[]): Promise<void>;
// Large-scale synthetic data (default environment)
async createSyntheticData(projectIds: string[], opts: SeederOptions): Promise<void>;
// Realistic support chat session traces
async createSupportChatSessionTraces(projectIds: string[]): Promise<void>;
// Framework-specific trace examples from JSON files
async createFrameworkTraces(projectIds: string[]): Promise<void>;
// Media attachment test traces
async createMediaTestTraces(projectIds: string[]): Promise<void>;
// Full seed: all data types together
async executeFullSeed(projectIds: string[], opts: SeederOptions): Promise<void>;
}
Import
import { SeederOrchestrator } from "./utils/seeder-orchestrator";
const orchestrator = new SeederOrchestrator();
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| projectIds | string[] | Yes | Array of project IDs to seed data for |
| opts | SeederOptions | Yes (for some methods) | Configuration including mode (standard/bulk), numberOfRuns, and numberOfDays
|
Outputs
| Name | Type | Description |
|---|---|---|
| ClickHouse traces | Inserted rows | Trace records across multiple environments (default, langfuse-prompt-experiment, langfuse-evaluation) |
| ClickHouse observations | Inserted rows | Observation records of various types (SPAN, GENERATION, AGENT, TOOL, etc.) |
| ClickHouse scores | Inserted rows | Score records (NUMERIC, BOOLEAN, CATEGORICAL, CORRECTION, EVAL) |
| ClickHouse dataset_run_items | Inserted rows | Dataset run item records linking items to experiment traces |
| Console statistics | stdout | Per-project row counts with bar chart visualization |
Usage Examples
import { SeederOrchestrator } from "./utils/seeder-orchestrator";
const orchestrator = new SeederOrchestrator();
const projectIds = [
"7a88fb47-b4e2-43b8-a06c-a5ce950dc53a",
"239ad00f-562f-411d-af14-831c75ddd875",
];
// Full seed with all data types
await orchestrator.executeFullSeed(projectIds, {
mode: "standard",
numberOfRuns: 3,
numberOfDays: 30,
});
// Or seed specific data types independently
await orchestrator.createDatasetExperimentData(projectIds, { mode: "standard", numberOfRuns: 3 });
await orchestrator.createEvaluationData(projectIds);
await orchestrator.createSyntheticData(projectIds, { mode: "bulk", numberOfDays: 90 });
await orchestrator.createFrameworkTraces(projectIds);