Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Helicone Helicone Response Usage Extraction

From Leeroopedia
Knowledge Sources
Domains LLM Observability, Cost Calculation, Token Accounting
Last Updated 2026-02-14 00:00 GMT

Overview

Extracting token usage statistics from provider-specific LLM response formats into a unified representation for downstream cost computation.

Description

Large Language Model providers each return token usage information in different shapes and locations within their response payloads. OpenAI includes a top-level usage object with prompt_tokens and completion_tokens. Anthropic returns usage with input_tokens and output_tokens, plus cache-specific fields. Google Vertex and Gemini use usageMetadata with promptTokenCount and candidatesTokenCount. Streaming responses further complicate extraction, as usage may only appear in the final chunk or must be accumulated across chunks.

Response Usage Extraction addresses this heterogeneity by defining a common interface that provider-specific processors implement. Each processor knows how to parse its provider's response body (both streaming and non-streaming) and emit a normalized ModelUsage structure containing input tokens, output tokens, cache details, thinking tokens, and modality-specific breakdowns for audio, image, video, and file tokens.

This principle is the first stage in Helicone's cost calculation pipeline. Without accurate usage extraction, all downstream cost computation -- rate lookup, cost multiplication, and aggregation -- would operate on incorrect data. The factory pattern ensures that adding support for a new provider requires only implementing a single interface and registering the processor in the factory switch.

Usage

Use this pattern whenever you need to normalize token counts from raw LLM response payloads before computing costs. It is appropriate when:

  • Processing responses from multiple LLM providers in a unified pipeline.
  • Handling both streaming and non-streaming response formats.
  • Extracting granular usage details such as cache read/write tokens, thinking tokens, or audio tokens that certain providers report.

Theoretical Basis

The pattern is an application of the Strategy pattern combined with a Factory method. The factory selects the correct strategy (usage processor) based on the provider identifier, decoupling the caller from provider-specific parsing logic. This follows the Open/Closed Principle: the system is open to extension (new providers) without modification to existing processor code.

The normalized output type (ModelUsage) acts as a Canonical Data Model -- a single authoritative schema that downstream pipeline stages depend on, regardless of the upstream data source shape.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment