Principle:Langfuse Langfuse Evaluation Job Triggering

Knowledge Sources	Langfuse
Domains	Event-Driven Architecture, Job Queue Management
Last Updated	2026-02-14 00:00 GMT

Overview

Evaluation Job Triggering is the principle of using a dedicated message queue as the entry point that decouples upstream event sources (trace ingestion, dataset mutations, UI batch actions) from the downstream evaluation job creation logic.

Description

In a production LLM evaluation pipeline, evaluations must be triggered reliably across multiple event sources without tightly coupling the producers of events to the consumers that create evaluation jobs. Evaluation Job Triggering solves this by introducing a BullMQ-based queue that acts as a reliable intermediary between event producers and the job creation logic.

Three distinct event sources feed into the evaluation triggering layer:

Trace Upsert Events -- When a new trace is ingested or an existing trace is updated via the Langfuse SDK, a message is published to the TraceUpsertQueue. The evaluation triggering layer listens for these events and forwards them for eval job creation with an enforced time scope of "NEW", ensuring that only evaluators configured to run on new data are triggered.

Dataset Run Item Upsert Events -- When a dataset run item is created or updated (e.g., during experiment execution), a message is published to the DatasetRunItemUpsertQueue. This enables evaluation of traces linked through dataset items, which is critical for experiment-based evaluation workflows.

UI-Triggered Batch Events -- When a user creates a new evaluator with a time scope that includes "EXISTING", or manually triggers a batch evaluation from the UI, jobs are enqueued to the CreateEvalQueue. These events carry a configId to target a specific job configuration and do not enforce a time scope restriction.

The queue provides durability, retry semantics, and backpressure that would be difficult to achieve with synchronous function calls.

Usage

Use Evaluation Job Triggering when:

You need to ensure that every trace upsert is considered for evaluation, even under high ingestion load
You want to decouple the trace ingestion pipeline from the evaluation pipeline so failures in evaluation do not block ingestion
You need automatic retry with exponential backoff for transient failures in evaluation job creation
You need to support batch evaluation of historical traces without blocking the web UI

Theoretical Basis

The Evaluation Job Triggering principle is built on the competing consumers and reliable delivery patterns from message-oriented architectures:

Queue Topology:

Event Sources                  Queue Layer                    Consumer
--------------                 -----------                    --------
TraceUpsertQueue          -->  evalJobTraceCreatorProcessor   -->  createEvalJobs()
DatasetRunItemUpsertQueue -->  evalJobDatasetCreatorProcessor -->  createEvalJobs()
BatchActionQueue (UI)     -->  evalJobCreatorProcessor        -->  createEvalJobs()

Retry Configuration:

defaultJobOptions = {
  removeOnComplete: 100,    // Keep last 100 completed jobs for debugging
  removeOnFail: 100000,     // Keep last 100k failed jobs for investigation
  attempts: 5,              // Maximum retry attempts
  backoff: {
    type: "exponential",    // Exponential backoff strategy
    delay: 5000             // Base delay of 5 seconds
  }
}

Retry schedule:
  Attempt 1: immediate
  Attempt 2: after 5 seconds
  Attempt 3: after 10 seconds
  Attempt 4: after 20 seconds
  Attempt 5: after 40 seconds

Singleton Queue Pattern:

The queue uses a singleton pattern to ensure exactly one queue instance per process. This prevents resource leaks from multiple Redis connections and ensures consistent queue configuration:

getInstance():
  IF instance exists:
    RETURN existing instance
  ELSE:
    CREATE Redis connection with offline queue disabled
    CREATE BullMQ Queue with retry options
    ATTACH error handler for logging
    STORE as singleton instance
    RETURN instance (or null if Redis unavailable)

Graceful Degradation:

If Redis is unavailable, the singleton returns null instead of throwing an error. Callers must handle the null case, which means evaluations are silently skipped when Redis is down rather than crashing the application. This design prioritizes system availability over evaluation completeness.

Infinite Loop Prevention:

The triggering layer includes a critical safeguard: traces with environments starting with "langfuse-" (internal evaluation traces) are excluded from evaluation triggering when they arrive via the trace-upsert path. This prevents an infinite cycle where: user trace triggers eval, eval creates its own trace, that trace triggers another eval, and so on.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment