Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Langfuse Langfuse Seeder Orchestrator

From Leeroopedia
Knowledge Sources
Domains Database Seeding, ClickHouse, Test Data Generation
Last Updated 2026-02-14 00:00 GMT

Overview

The SeederOrchestrator class coordinates all ClickHouse seeding operations, orchestrating the generation and insertion of dataset experiments, evaluation data, synthetic traces, support chat sessions, framework traces, and media test traces.

Description

The SeederOrchestrator is the top-level coordinator for populating ClickHouse with development seed data. It brings together the DataGenerator (for creating record objects), the ClickHouseQueryBuilder (for executing inserts), and the FrameworkTraceLoader (for loading framework-specific trace files).

Upon construction, the orchestrator:

  1. Instantiates the DataGenerator singleton.
  2. Creates a ClickHouseQueryBuilder instance.
  3. Loads external file content from nested_json.json, markdown.txt, and chat_ml_json.json, truncating large content (first 3 products, first 4 messages) for reasonable seed data sizes.
  4. Passes the loaded file content to the DataGenerator.

The orchestrator exposes five public methods for creating different types of seed data:

createDatasetExperimentData(projectIds, opts):

  • Iterates over projects and dataset runs (configurable via opts.numberOfRuns).
  • For each dataset with shouldRunExperiment: true, generates traces, observations, dataset run items, and scores.
  • Inserts all records into ClickHouse in sequence (traces -> observations -> dataset run items -> scores).

createEvaluationData(projectIds):

  • Generates 100 evaluation traces per project with 10 observations each.
  • Creates one score per evaluation trace (skipping every 10th to simulate failures).

createSyntheticData(projectIds, opts):

  • Supports two modes: "bulk" (using raw SQL generation for high-volume inserts) and standard (using record-by-record generation).
  • Generates traces with 15 observations each and 10 scores per trace.
  • The total observation count is determined by the opts.mode setting.

createSupportChatSessionTraces(projectIds):

  • Creates realistic multi-turn support chat conversation data.

createFrameworkTraces(projectIds):

  • Loads real trace data from the 9 framework JSON files (Agno, BeeAI, Google ADK, Koog, LangGraph, Microsoft Agent, OpenAI Agents, OpenAI Assistants, Pydantic AI).
  • Inserts the loaded traces, observations, and scores into ClickHouse.

createMediaTestTraces(projectIds):

  • Creates 3 test traces for media attachment testing: image-only, all media types, and all types with ChatML format.

executeFullSeed(projectIds, opts):

  • Runs all seeding operations in sequence: dataset experiments, evaluation data, synthetic data, support chat sessions, framework traces, and media test traces.
  • Logs per-project statistics after completion.

The orchestrator also includes a logStatistics() method that queries ClickHouse for per-project counts across traces, scores, and observations tables and displays bar chart representations.

Usage

Use this class when:

  • Running the full ClickHouse seeding process for development.
  • Creating specific types of seed data in isolation (datasets only, evaluation only, etc.).
  • Extending the seeding pipeline with new data types or sources.

Code Reference

Source Location

Signature

export class SeederOrchestrator {
  private dataGenerator: DataGenerator;
  private queryBuilder: ClickHouseQueryBuilder;
  private fileContent: FileContent | null;

  constructor();

  // Dataset experiment data (langfuse-prompt-experiment environment)
  async createDatasetExperimentData(projectIds: string[], opts: SeederOptions): Promise<void>;

  // Evaluation data (langfuse-evaluation environment)
  async createEvaluationData(projectIds: string[]): Promise<void>;

  // Large-scale synthetic data (default environment)
  async createSyntheticData(projectIds: string[], opts: SeederOptions): Promise<void>;

  // Realistic support chat session traces
  async createSupportChatSessionTraces(projectIds: string[]): Promise<void>;

  // Framework-specific trace examples from JSON files
  async createFrameworkTraces(projectIds: string[]): Promise<void>;

  // Media attachment test traces
  async createMediaTestTraces(projectIds: string[]): Promise<void>;

  // Full seed: all data types together
  async executeFullSeed(projectIds: string[], opts: SeederOptions): Promise<void>;
}

Import

import { SeederOrchestrator } from "./utils/seeder-orchestrator";

const orchestrator = new SeederOrchestrator();

I/O Contract

Inputs

Name Type Required Description
projectIds string[] Yes Array of project IDs to seed data for
opts SeederOptions Yes (for some methods) Configuration including mode (standard/bulk), numberOfRuns, and numberOfDays

Outputs

Name Type Description
ClickHouse traces Inserted rows Trace records across multiple environments (default, langfuse-prompt-experiment, langfuse-evaluation)
ClickHouse observations Inserted rows Observation records of various types (SPAN, GENERATION, AGENT, TOOL, etc.)
ClickHouse scores Inserted rows Score records (NUMERIC, BOOLEAN, CATEGORICAL, CORRECTION, EVAL)
ClickHouse dataset_run_items Inserted rows Dataset run item records linking items to experiment traces
Console statistics stdout Per-project row counts with bar chart visualization

Usage Examples

import { SeederOrchestrator } from "./utils/seeder-orchestrator";

const orchestrator = new SeederOrchestrator();

const projectIds = [
  "7a88fb47-b4e2-43b8-a06c-a5ce950dc53a",
  "239ad00f-562f-411d-af14-831c75ddd875",
];

// Full seed with all data types
await orchestrator.executeFullSeed(projectIds, {
  mode: "standard",
  numberOfRuns: 3,
  numberOfDays: 30,
});

// Or seed specific data types independently
await orchestrator.createDatasetExperimentData(projectIds, { mode: "standard", numberOfRuns: 3 });
await orchestrator.createEvaluationData(projectIds);
await orchestrator.createSyntheticData(projectIds, { mode: "bulk", numberOfDays: 90 });
await orchestrator.createFrameworkTraces(projectIds);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment