Principle:Langfuse Langfuse OTel Input Output Extraction
| Knowledge Sources | |
|---|---|
| Domains | Observability, OpenTelemetry, AI Frameworks, Data Normalization |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
OTel Input Output Extraction is the principle of locating and normalizing LLM input/output data from OpenTelemetry span attributes and events across a diverse ecosystem of AI framework instrumentation conventions.
Description
The core challenge this principle addresses is that there is no single standard for how AI frameworks encode prompts (inputs) and completions (outputs) in OTel span data. Each framework uses different attribute keys, event names, and data structures:
- Langfuse SDK uses
langfuse.observation.inputandlangfuse.observation.output(or trace-level equivalents). - Vercel AI SDK uses
ai.prompt.messagesfor input andai.response.text,ai.response.toolCalls, orai.response.objectfor output. - OTel GenAI semantic conventions use span events named
gen_ai.system.message,gen_ai.user.message, etc. for input andgen_ai.choicefor output. - Google Vertex ADK uses
gcp.vertex.agent.llm_request/gcp.vertex.agent.llm_response. - Logfire (Pydantic) uses
promptandall_messages_eventsor aneventsarray. - LiveKit uses
lk.input_textandlk.response.text. - MLFlow uses
mlflow.spanInputs/mlflow.spanOutputs. - TraceLoop uses
traceloop.entity.input/traceloop.entity.outputorgen_ai.prompt.*/gen_ai.completion.*path-based keys. - SmolAgents uses
input.value/output.value. - Pydantic / Pipecat uses plain
input/output. - OpenInference uses
llm.input_messages.*/llm.output_messages.*flattened key paths. - OpenTelemetry GenAI messages uses
gen_ai.input.messages/gen_ai.output.messages. - Legacy Semantic Kernel uses events named
gen_ai.content.prompt/gen_ai.content.completion.
The extraction follows a waterfall / chain-of-responsibility pattern: it tries each framework's convention in a specific priority order, returning the first successful match.
Additionally, the method produces filteredAttributes -- a copy of the original attributes with all potential input/output keys removed. This prevents duplicate data from appearing in both the extracted input/output fields and the metadata.
Usage
Apply this principle when:
- Building an ingestion pipeline that must extract structured LLM prompt/completion data from OTel spans.
- Supporting a growing ecosystem of AI framework instrumentations, each with its own attribute conventions.
- Needing to distinguish between trace-level input/output and observation-level input/output (the
domainparameter). - Wanting to keep metadata clean by stripping extracted input/output keys from the attribute set.
Theoretical Basis
The extraction algorithm follows this waterfall:
EXTRACT INPUT/OUTPUT from (events[], attributes, instrumentationScopeName, domain?)
|
v
STEP 0: Pre-filter all known input/output attribute keys from filteredAttributes
| (prevents duplication regardless of which path matches)
|
v
STEP 1: LANGFUSE SDK
Check langfuse.observation.input / langfuse.observation.output
(or langfuse.trace.input / langfuse.trace.output if domain == "trace")
-> If found, return immediately
|
v
STEP 2: VERCEL AI SDK (instrumentationScopeName == "ai")
Input: ai.prompt.messages > ai.prompt > ai.toolCall.args
Output: ai.response.text + ai.response.toolCalls (combined)
> ai.response.text > ai.result.text (legacy)
> ai.toolCall.result > ai.response.object
> ai.result.object (legacy) > ai.response.toolCalls
> ai.result.toolCalls (legacy)
-> If found, return immediately
|
v
STEP 3: OTEL GENAI EVENTS (span events array)
Input events: gen_ai.system.message, gen_ai.user.message,
gen_ai.assistant.message, gen_ai.tool.message
Output events: gen_ai.choice
-> Extract event attributes, build role-annotated message arrays
-> If found, return immediately
|
v
STEP 4: LEGACY SEMANTIC KERNEL EVENTS
Input event: gen_ai.content.prompt
Output event: gen_ai.content.completion
-> Recursively call extractInputAndOutput on event attributes
-> If found, return immediately
|
v
STEP 5: GOOGLE VERTEX ADK
gcp.vertex.agent.llm_request / gcp.vertex.agent.llm_response
Falls back to gcp.vertex.agent.tool_call_args / tool_response
when llm_request/response is "{}" (empty tool call case)
-> If found, return immediately
|
v
STEP 6: LOGFIRE (prompt / all_messages_events)
-> If found, return immediately
|
v
STEP 7: LIVEKIT (lk.input_text / lk.response.text / lk.function_tool.output)
-> If found, return immediately
|
v
STEP 8: LOGFIRE EVENTS ARRAY (events attribute with gen_ai.choice)
Parse the "events" attribute (JSON string or array)
Separate gen_ai.choice from other events
-> If found, return immediately
|
v
STEP 9: MLFLOW (mlflow.spanInputs / mlflow.spanOutputs)
-> If found, return immediately
|
v
STEP 10: TRACELOOP (traceloop.entity.input / traceloop.entity.output)
-> If found, return immediately
|
v
STEP 11: SMOLAGENTS (input.value / output.value)
-> If found, return immediately
|
v
STEP 12: PYDANTIC / PIPECAT (input / output)
-> If found, return immediately
|
v
STEP 13: PYDANTIC-AI TOOLS (tool_arguments / tool_response)
-> If found, return immediately
|
v
STEP 14: TRACELOOP PATH-BASED (gen_ai.prompt.* / gen_ai.completion.*)
Collect keys starting with gen_ai.prompt and gen_ai.completion
Convert flattened key paths to nested objects
-> If found, return immediately
|
v
STEP 15: OPENINFERENCE (llm.input_messages.* / llm.output_messages.*)
Collect flattened key paths, reconstruct nested message arrays
-> If found, return immediately
|
v
STEP 16: OTEL MESSAGES (gen_ai.input.messages / gen_ai.output.messages)
-> If found, return immediately
|
v
STEP 17: OTEL TOOLS (gen_ai.tool.call.arguments / gen_ai.tool.call.result)
-> If found, return immediately
|
v
DEFAULT: Return { input: null, output: null, filteredAttributes }
Key design decisions:
- Early return on first match: Once a framework's input/output is found, no further frameworks are checked. This prevents conflicting extractions when spans carry attributes from multiple instrumentation libraries.
- Pre-deletion of all potential keys: All known input/output attribute keys are removed from filteredAttributes upfront (Step 0), regardless of which framework ultimately matches. This ensures clean metadata even when multiple frameworks' attributes coexist.
- Vercel AI SDK combined output: When both
ai.response.textandai.response.toolCallsare present, they are combined into a single JSON object with role "assistant", matching the ChatML assistant message format. - Recursive extraction: Legacy Semantic Kernel events recursively call the extraction function on their event attributes, reusing the same multi-framework detection for the event content.
- Path-based key reconstruction: TraceLoop and OpenInference flatten nested objects into dot-separated key paths (e.g.,
gen_ai.prompt.0.content). The extraction reconstructs these into proper nested arrays/objects usingconvertKeyPathToNestedObject().