Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Langfuse Langfuse Evaluation Job Creation

From Leeroopedia
Knowledge Sources
Domains LLM Evaluation, Workflow Orchestration
Last Updated 2026-02-14 00:00 GMT

Overview

Evaluation Job Creation is the principle of matching incoming trace or dataset events against active evaluation configurations, applying filters, deduplication, and sampling, then creating individual job execution records and enqueuing them for downstream LLM evaluation.

Description

Once an event (trace upsert, dataset run item upsert, or UI-triggered batch action) arrives from the triggering layer, the system must determine which evaluation configurations apply, verify that the target data exists, and create concrete execution records. Evaluation Job Creation encapsulates this decision logic.

The process involves multiple stages of filtering and validation:

  1. Configuration Retrieval -- All active job configurations for the event's project are fetched from the database. Configurations are filtered by type (EVAL), status (ACTIVE), and target object (TRACE or DATASET). If an enforced time scope is provided (e.g., "NEW" for live trace events), only configurations with a matching time scope are included.
  1. Infinite Loop Prevention -- Traces with environments prefixed by "langfuse-" are identified as internal evaluation traces and are excluded from job creation when the event source is trace-upsert. This prevents the infinite cycle: user trace produces eval trace, which triggers another eval, and so on.
  1. Trace Existence and Filter Validation -- The system verifies the trace exists in ClickHouse and applies the evaluation's filter conditions. An in-memory filter optimization is used when the trace data has already been fetched for other configurations, avoiding redundant ClickHouse queries.
  1. Dataset Item Resolution -- For dataset-targeted evaluations, the system resolves the dataset item linked to the trace. This involves looking up dataset items by trace ID with optional version matching and applying any dataset-level filter conditions.
  1. Observation Existence Check -- When the event references a specific observation (common in dataset run items linked at the observation level), the system verifies the observation exists. If not found, an ObservationNotFoundError is thrown to trigger a retry, accommodating data replication delays.
  1. Deduplication -- Existing job executions for the same configuration and trace are checked in a single batched query. If an execution already exists, the new event is skipped to prevent duplicate evaluations.
  1. Sampling -- The configuration's sampling rate (a float between 0 and 1) is applied probabilistically. A random number is generated, and if it exceeds the sampling rate, the job is skipped.
  1. Execution Creation and Enqueuing -- A new job execution record is created with PENDING status, and a message is enqueued to the EvalExecutionQueue with an optional delay (in milliseconds) to allow trace data to settle.

Usage

Use Evaluation Job Creation when:

  • You need to understand how the system decides which evaluations to run for a given trace
  • You want to trace why an evaluation was or was not created for a specific trace
  • You are debugging evaluation deduplication or sampling behavior
  • You need to understand the data flow between the triggering layer and the execution layer

Theoretical Basis

The Evaluation Job Creation principle implements a filter-sample-deduplicate-enqueue pipeline:

Step 1 - Configuration Matching:

configs = QUERY job_configurations
  WHERE job_type = "EVAL"
  AND project_id = event.projectId
  AND status = "ACTIVE"
  AND target_object IN ("trace", "dataset")
  AND (configId IS NULL OR id = event.configId)
  AND (enforcedTimeScope IS NULL OR time_scope CONTAINS enforcedTimeScope)

Step 2 - Caching Optimization:

IF configs.length > 1:
  cachedTrace = FETCH trace from ClickHouse (excluding input/output for performance)
  cachedDatasetItemIds = FETCH dataset item IDs from ClickHouse

allExistingJobs = BATCH QUERY job_executions
  WHERE project_id = event.projectId
  AND trace_id = event.traceId
  AND config_id IN configs.map(id)

Step 3 - Per-Config Processing Loop:

FOR EACH config IN configs:
  // Skip inactive configs
  IF config.status == INACTIVE: CONTINUE

  // Check trace existence with filter
  IF cachedTrace AND filter is in-memory evaluable:
    traceExists = IN_MEMORY_FILTER(cachedTrace, config.filter)
  ELSE:
    traceExists = DATABASE_LOOKUP(traceId, filter)

  // Resolve dataset item if applicable
  IF config.target == "dataset":
    datasetItem = RESOLVE_DATASET_ITEM(event, config.filter)

  // Skip observation-level dataset evals from trace-upsert source
  IF source == "trace-upsert" AND datasetItem.observationId EXISTS:
    CONTINUE

  // Check observation existence if referenced
  IF event.observationId:
    IF NOT observationExists(observationId):
      THROW ObservationNotFoundError (triggers retry)

  // Deduplication check
  existingJob = FIND_MATCHING_JOB(config.id, datasetItemId, observationId)

  IF traceExists AND (NOT datasetConfig OR datasetItem EXISTS):
    IF existingJob: CONTINUE  // Already evaluated

    // Sampling
    IF config.sampling != 1:
      IF random() > config.sampling: CONTINUE  // Sampled out

    // Create execution and enqueue
    CREATE jobExecution(status: PENDING)
    ENQUEUE to EvalExecutionQueue(delay: config.delay)
  ELSE:
    // Cancel stale execution if trace no longer matches
    IF existingJob AND existingJob.status != COMPLETED:
      UPDATE existingJob SET status = CANCELLED

Cancellation Semantics:

An important aspect of this pipeline is that trace updates can deselect a trace from an evaluation. If a trace previously matched an evaluation's filter but no longer matches after an update, any pending (non-completed) execution for that trace is cancelled. This ensures evaluations reflect the most current state of trace data.

Event Loop Yielding:

The per-config processing loop yields to the Node.js event loop between iterations using setImmediate(). This prevents long-running evaluation job creation from blocking other tasks in the worker process.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment