Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Confident ai Deepeval LLM Span Enrichment

From Leeroopedia

Overview

LLM Span Enrichment is the principle of enriching LLM-type spans with model-specific metadata, including token counts, cost information, and prompt details. This enables granular cost tracking and performance analysis on a per-LLM-call basis, providing the foundation for understanding the economic and operational characteristics of LLM-powered applications.

Core Concept

LLM calls are the most resource-intensive operations in AI applications. Each call consumes tokens (both input and output), incurs monetary costs, and varies in latency and quality depending on the model used. Enriching LLM spans with this data is essential because:

  • Token usage monitoring -- Tracking input_token_count and output_token_count for each LLM call enables teams to understand token consumption patterns, identify unexpectedly verbose prompts, and optimize prompt engineering efforts.
  • Cost attribution -- By associating per-token costs (cost_per_input_token, cost_per_output_token) with each span, teams can compute the exact monetary cost of each LLM interaction, enabling budget tracking and cost optimization.
  • Model tracking -- Recording which model was used for each call allows comparison of quality, cost, and latency across different models and model versions.
  • Prompt management -- Capturing the prompt template or content used for each LLM call supports prompt versioning, A/B testing, and regression detection.

Theoretical Basis

This principle draws from established practices in resource monitoring and cost management:

  • LLM cost attribution -- The practice of assigning precise monetary costs to individual LLM operations, analogous to cloud cost attribution in infrastructure monitoring.
  • Token usage monitoring -- Tracking token consumption as a first-class metric, similar to how traditional applications monitor CPU, memory, and network usage.
  • Prompt management -- The discipline of treating prompts as versioned artifacts whose performance and cost characteristics should be tracked over time.

Why It Matters

Without LLM span enrichment:

  • Cost overruns go undetected -- teams cannot identify which functions or features are consuming the most tokens and incurring the highest costs
  • Model comparison is impossible -- no data exists to compare the cost-effectiveness of different models for the same task
  • Token budgets cannot be enforced -- without per-call token tracking, aggregate limits are the only option, leading to coarse-grained control
  • Prompt optimization lacks data -- without knowing how many tokens each prompt consumes, optimization efforts are guesswork

LLM span enrichment turns each model call into a fully instrumented operation with complete economic and operational metadata.

Relationship to Implementation

This principle is realized through the update_llm_span function, which injects model-specific metadata into the current LLM-type span.

Implementation:Confident_ai_Deepeval_Update_LLM_Span

Metadata

DeepEval Tracing Observability LLM_Evaluation 2026-02-14 09:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment