Principle:Helicone Helicone Response Usage Extraction
| Knowledge Sources | |
|---|---|
| Domains | LLM Observability, Cost Calculation, Token Accounting |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Extracting token usage statistics from provider-specific LLM response formats into a unified representation for downstream cost computation.
Description
Large Language Model providers each return token usage information in different shapes and locations within their response payloads. OpenAI includes a top-level usage object with prompt_tokens and completion_tokens. Anthropic returns usage with input_tokens and output_tokens, plus cache-specific fields. Google Vertex and Gemini use usageMetadata with promptTokenCount and candidatesTokenCount. Streaming responses further complicate extraction, as usage may only appear in the final chunk or must be accumulated across chunks.
Response Usage Extraction addresses this heterogeneity by defining a common interface that provider-specific processors implement. Each processor knows how to parse its provider's response body (both streaming and non-streaming) and emit a normalized ModelUsage structure containing input tokens, output tokens, cache details, thinking tokens, and modality-specific breakdowns for audio, image, video, and file tokens.
This principle is the first stage in Helicone's cost calculation pipeline. Without accurate usage extraction, all downstream cost computation -- rate lookup, cost multiplication, and aggregation -- would operate on incorrect data. The factory pattern ensures that adding support for a new provider requires only implementing a single interface and registering the processor in the factory switch.
Usage
Use this pattern whenever you need to normalize token counts from raw LLM response payloads before computing costs. It is appropriate when:
- Processing responses from multiple LLM providers in a unified pipeline.
- Handling both streaming and non-streaming response formats.
- Extracting granular usage details such as cache read/write tokens, thinking tokens, or audio tokens that certain providers report.
Theoretical Basis
The pattern is an application of the Strategy pattern combined with a Factory method. The factory selects the correct strategy (usage processor) based on the provider identifier, decoupling the caller from provider-specific parsing logic. This follows the Open/Closed Principle: the system is open to extension (new providers) without modification to existing processor code.
The normalized output type (ModelUsage) acts as a Canonical Data Model -- a single authoritative schema that downstream pipeline stages depend on, regardless of the upstream data source shape.