Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Langfuse Langfuse Trace and Observation Event Generation

From Leeroopedia
Knowledge Sources
Domains Observability, Ingestion, OpenTelemetry, Deduplication
Last Updated 2026-02-14 00:00 GMT

Overview

Trace and Observation Event Generation is the principle of converting OpenTelemetry ResourceSpans into Langfuse IngestionEvent records by splitting each span into a trace-create event and an observation event, while deduplicating redundant shallow traces across batches.

Description

In the Langfuse data model, every OTel span produces two events:

  1. A trace-create event that ensures a parent trace exists for the span.
  2. An observation event (generation-create, span-create, tool-create, etc.) that captures the actual span data.

However, not every span should generate a full trace-create event. The system distinguishes between three categories of trace events:

  • Root span trace: When a span has no parent (or is explicitly marked as root via langfuse.internal.as_root), a full trace event is generated containing name, metadata, userId, sessionId, tags, input/output, and all other trace-level properties.
  • Trace-update span: When a non-root span carries trace-level attributes (e.g., langfuse.trace.name, user.id, session.id, langfuse.trace.tags, or any langfuse.trace.metadata.* key), a full trace event is generated to propagate these updates.
  • Shallow trace: When a non-root span without trace-level attributes references a trace ID that has not been seen in the current batch, a minimal trace event is generated containing only { id, timestamp, environment }. This ensures the trace exists but avoids overwriting existing trace data.

Deduplication is critical because multiple spans within the same trace produce trace-create events, and in high-throughput scenarios the same trace ID may appear across multiple ingestion batches within a short time window. The system uses two deduplication mechanisms:

  1. Redis-based cross-batch deduplication: On initialization, the processor queries Redis for each unique trace ID using SET NX with a 600-second TTL (langfuse:project:{projectId}:trace:{traceId}:seen). If the key already existed (the trace was seen in a recent batch), the trace ID is added to the "seen" set, suppressing shallow trace creation for that ID.
  1. Intra-batch shallow trace filtering: After all events are generated, the filterRedundantShallowTraces() method performs an O(n) pass to identify trace IDs where both shallow and full trace events exist. Shallow events for such trace IDs are removed, keeping only the full trace.

Usage

Apply this principle when:

  • Converting tree-structured OTel trace data into flat event streams for a trace store.
  • Each span in the tree needs to ensure its parent trace exists without overwriting richer trace data from other spans.
  • Batched ingestion means the same trace ID can appear in multiple processing runs.
  • Minimizing unnecessary writes to the database while ensuring trace consistency.

Theoretical Basis

PROCESS RESOURCE SPANS
    |
    v
INITIALIZE (lazy, on first call):
    For each unique traceId in all resourceSpans:
        Redis SET NX "langfuse:project:{projectId}:trace:{traceId}:seen" "1" EX 600
        If key already existed -> add traceId to seenTraces set
    |
    v
FOR EACH ResourceSpan:
  FOR EACH ScopeSpan:
    Detect isLangfuseSDKSpans (scope.name starts with "langfuse-sdk")
    FOR EACH Span:
        |
        v
      EXTRACT: traceId, parentSpanId, attributes, startTime, endTime
        |
        v
      DETERMINE: isRootSpan = (no parentSpanId OR langfuse.internal.as_root == "true")
      DETERMINE: hasTraceUpdates = (any trace-level attributes present)
        |
        v
      TRACE EVENT DECISION:
        IF isRootSpan OR hasTraceUpdates OR traceId NOT in seenTraces:
            |
            |--- isRootSpan -> FULL TRACE EVENT
            |    (name, metadata, userId, sessionId, tags, input, output,
            |     version, release, public, environment)
            |    Count as "rootSpanClosed"
            |
            |--- hasTraceUpdates AND NOT isRootSpan -> FULL TRACE EVENT
            |    (with trace-level attributes from span)
            |    Count as "traceUpdated"
            |
            |--- ELSE (not seen before) -> SHALLOW TRACE EVENT
            |    ({ id, timestamp, environment } only)
            |    Count as "shallow"
            |
            Add traceId to seenTraces
        ELSE:
            No trace event (already seen, no updates)
        |
        v
      OBSERVATION EVENT:
        Always generated for every span.
        Type determined by ObservationTypeMapperRegistry.
        Contains: id (spanId), traceId, parentObservationId, name,
                  startTime, endTime, input, output, model, modelParameters,
                  usageDetails, costDetails, metadata, level, statusMessage,
                  version, promptName, promptVersion, environment
        |
        v
      Add trace event (if any) + observation event to result list
    |
    v
POST-FILTER: filterRedundantShallowTraces()
    Single O(n) pass:
    1. Categorize all trace-create events by traceId as shallow or full
    2. For traceIds with BOTH shallow and full events, exclude shallow events
    3. Return filtered event list
    |
    v
RETURN: IngestionEventType[]

Trace-level attribute detection checks for these attributes (any present triggers a "trace update"):

  • langfuse.trace.name, langfuse.trace.input, langfuse.trace.output, langfuse.trace.metadata
  • user.id, session.id, langfuse.trace.public, langfuse.trace.tags
  • langfuse.user.id, langfuse.session.id (compat keys)
  • langfuse.observation.metadata.langfuse_user_id (OpenAI/Langchain integration pattern)
  • ai.telemetry.metadata.sessionId, ai.telemetry.metadata.userId, ai.telemetry.metadata.tags (Vercel AI SDK)
  • tag.tags (LlamaIndex)
  • Any key starting with langfuse.trace.metadata.

Shallow trace detection: A trace event body is considered "shallow" if it contains no meaningful values for any of: name, metadata, userId, sessionId, public, tags, version, release, input, output. Meaningful means non-null, non-empty-string, non-empty-array, non-empty-object.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment