Principle:Langfuse Langfuse Trace and Observation Event Generation
| Knowledge Sources | |
|---|---|
| Domains | Observability, Ingestion, OpenTelemetry, Deduplication |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Trace and Observation Event Generation is the principle of converting OpenTelemetry ResourceSpans into Langfuse IngestionEvent records by splitting each span into a trace-create event and an observation event, while deduplicating redundant shallow traces across batches.
Description
In the Langfuse data model, every OTel span produces two events:
- A trace-create event that ensures a parent trace exists for the span.
- An observation event (generation-create, span-create, tool-create, etc.) that captures the actual span data.
However, not every span should generate a full trace-create event. The system distinguishes between three categories of trace events:
- Root span trace: When a span has no parent (or is explicitly marked as root via
langfuse.internal.as_root), a full trace event is generated containing name, metadata, userId, sessionId, tags, input/output, and all other trace-level properties. - Trace-update span: When a non-root span carries trace-level attributes (e.g.,
langfuse.trace.name,user.id,session.id,langfuse.trace.tags, or anylangfuse.trace.metadata.*key), a full trace event is generated to propagate these updates. - Shallow trace: When a non-root span without trace-level attributes references a trace ID that has not been seen in the current batch, a minimal trace event is generated containing only
{ id, timestamp, environment }. This ensures the trace exists but avoids overwriting existing trace data.
Deduplication is critical because multiple spans within the same trace produce trace-create events, and in high-throughput scenarios the same trace ID may appear across multiple ingestion batches within a short time window. The system uses two deduplication mechanisms:
- Redis-based cross-batch deduplication: On initialization, the processor queries Redis for each unique trace ID using SET NX with a 600-second TTL (
langfuse:project:{projectId}:trace:{traceId}:seen). If the key already existed (the trace was seen in a recent batch), the trace ID is added to the "seen" set, suppressing shallow trace creation for that ID.
- Intra-batch shallow trace filtering: After all events are generated, the
filterRedundantShallowTraces()method performs an O(n) pass to identify trace IDs where both shallow and full trace events exist. Shallow events for such trace IDs are removed, keeping only the full trace.
Usage
Apply this principle when:
- Converting tree-structured OTel trace data into flat event streams for a trace store.
- Each span in the tree needs to ensure its parent trace exists without overwriting richer trace data from other spans.
- Batched ingestion means the same trace ID can appear in multiple processing runs.
- Minimizing unnecessary writes to the database while ensuring trace consistency.
Theoretical Basis
PROCESS RESOURCE SPANS
|
v
INITIALIZE (lazy, on first call):
For each unique traceId in all resourceSpans:
Redis SET NX "langfuse:project:{projectId}:trace:{traceId}:seen" "1" EX 600
If key already existed -> add traceId to seenTraces set
|
v
FOR EACH ResourceSpan:
FOR EACH ScopeSpan:
Detect isLangfuseSDKSpans (scope.name starts with "langfuse-sdk")
FOR EACH Span:
|
v
EXTRACT: traceId, parentSpanId, attributes, startTime, endTime
|
v
DETERMINE: isRootSpan = (no parentSpanId OR langfuse.internal.as_root == "true")
DETERMINE: hasTraceUpdates = (any trace-level attributes present)
|
v
TRACE EVENT DECISION:
IF isRootSpan OR hasTraceUpdates OR traceId NOT in seenTraces:
|
|--- isRootSpan -> FULL TRACE EVENT
| (name, metadata, userId, sessionId, tags, input, output,
| version, release, public, environment)
| Count as "rootSpanClosed"
|
|--- hasTraceUpdates AND NOT isRootSpan -> FULL TRACE EVENT
| (with trace-level attributes from span)
| Count as "traceUpdated"
|
|--- ELSE (not seen before) -> SHALLOW TRACE EVENT
| ({ id, timestamp, environment } only)
| Count as "shallow"
|
Add traceId to seenTraces
ELSE:
No trace event (already seen, no updates)
|
v
OBSERVATION EVENT:
Always generated for every span.
Type determined by ObservationTypeMapperRegistry.
Contains: id (spanId), traceId, parentObservationId, name,
startTime, endTime, input, output, model, modelParameters,
usageDetails, costDetails, metadata, level, statusMessage,
version, promptName, promptVersion, environment
|
v
Add trace event (if any) + observation event to result list
|
v
POST-FILTER: filterRedundantShallowTraces()
Single O(n) pass:
1. Categorize all trace-create events by traceId as shallow or full
2. For traceIds with BOTH shallow and full events, exclude shallow events
3. Return filtered event list
|
v
RETURN: IngestionEventType[]
Trace-level attribute detection checks for these attributes (any present triggers a "trace update"):
langfuse.trace.name,langfuse.trace.input,langfuse.trace.output,langfuse.trace.metadatauser.id,session.id,langfuse.trace.public,langfuse.trace.tagslangfuse.user.id,langfuse.session.id(compat keys)langfuse.observation.metadata.langfuse_user_id(OpenAI/Langchain integration pattern)ai.telemetry.metadata.sessionId,ai.telemetry.metadata.userId,ai.telemetry.metadata.tags(Vercel AI SDK)tag.tags(LlamaIndex)- Any key starting with
langfuse.trace.metadata.
Shallow trace detection: A trace event body is considered "shallow" if it contains no meaningful values for any of: name, metadata, userId, sessionId, public, tags, version, release, input, output. Meaningful means non-null, non-empty-string, non-empty-array, non-empty-object.