Workflow:Helicone Helicone Cost Calculation Pipeline
| Knowledge Sources | |
|---|---|
| Domains | LLM_Ops, Cost_Tracking, Analytics |
| Last Updated | 2026-02-14 06:00 GMT |
Overview
End-to-end process for calculating the monetary cost of LLM API requests using a dual-tier cost system: a legacy provider-based mapping and a new model registry with O(1) lookups, tiered pricing, and ClickHouse SQL generation.
Description
This workflow describes how Helicone calculates the cost of every LLM API request that passes through the platform. The cost package supports two calculation paths: a legacy system that maps provider URLs to flat per-token rates, and a newer model registry system that uses an author/model/endpoint hierarchy with tiered pricing, cache token multipliers, and multi-modality support (text, image, audio, video).
The pipeline begins when a response is received from an LLM provider. A provider-specific UsageProcessor extracts token counts from the response body (each provider returns usage data in a different format). The extracted usage is then matched against the appropriate cost data using either the legacy provider mapping or the model registry. The final cost is computed by multiplying token counts by per-token rates, accounting for pricing tiers based on context window size, cache read/write discounts, and special token types (reasoning tokens, audio tokens). Costs are stored with a precision multiplier of 1 billion to avoid floating-point issues in ClickHouse analytics queries.
Usage
This workflow executes automatically for every LLM request processed by Helicone. It is also used when generating ClickHouse SQL CASE expressions for batch cost aggregation in analytics dashboards, and when users query cost data through the API.
Execution Steps
Step 1: Response Usage Extraction
When a response arrives from an LLM provider, the system selects the appropriate UsageProcessor based on the provider identifier. Each processor knows how to extract token counts from its provider's specific response format. The extracted data is normalized into a unified ModelUsage structure containing prompt tokens, completion tokens, cache read/write tokens, reasoning tokens, and audio tokens.
Key considerations:
- OpenAI, Anthropic, Google, Bedrock, DeepSeek, Groq, xAI, OpenRouter, and Vertex each have dedicated UsageProcessors
- OpenRouter provides direct cost passthrough, bypassing per-token calculation
- Streaming responses accumulate token counts across chunks before final extraction
- The getUsageProcessor factory function selects the correct processor based on provider URL pattern
Step 2: Model and Provider Identification
The system identifies the model and provider from the request metadata. For the legacy system, the provider URL is matched against regex patterns in the provider mappings. For the model registry, the model string is parsed to extract model name, optional provider hint, and optional deployment region (e.g., "claude-3.5-haiku/bedrock/us-west-2").
Key considerations:
- Model strings support three formats: model-only, model/provider, and model/provider/deployment
- The legacy system uses URL pattern matching (regex) to identify providers
- The registry system uses direct Map lookups for O(1) model resolution
Step 3: Cost Rate Lookup
The per-token cost rates are retrieved from the cost data source. In the legacy system, this is a flat lookup from the provider's cost array matching the model name. In the registry system, the ModelProviderConfig is resolved and pricing tiers are selected based on the actual context length used in the request.
Key considerations:
- The registry supports threshold-based pricing where rates change above certain context lengths
- Cache read multipliers typically discount input costs (e.g., 0.1x for 90% savings)
- Cache write multipliers may increase costs (e.g., 1.25x for Anthropic cache writes)
- Some models have separate pricing for reasoning/thinking tokens
Step 4: Cost Computation
The final cost is computed by multiplying each token count by its corresponding rate. The calculation accounts for: standard prompt tokens, completion tokens, cached prompt read tokens (at discounted rate), cached prompt write tokens, reasoning tokens, audio input/output tokens, image costs, and per-call fees.
Pseudocode:
totalCost = (promptTokens * promptRate)
+ (completionTokens * completionRate)
+ (cacheReadTokens * promptRate * cacheReadMultiplier)
+ (cacheWriteTokens * promptRate * cacheWriteMultiplier)
+ (images * perImageRate)
+ perCallFee
Key considerations:
- All costs are stored as integers by multiplying by COST_PRECISION_MULTIPLIER (1 billion)
- This avoids floating-point precision issues in ClickHouse aggregation queries
- The final displayed cost divides by the precision multiplier
Step 5: ClickHouse Cost SQL Generation
For analytics dashboards that aggregate costs across many requests, the system generates ClickHouse SQL CASE/WHEN expressions. These expressions embed the complete cost lookup logic directly into SQL, enabling the database to compute costs server-side without per-row API calls. The generated SQL maps model/provider combinations to their per-token rates using CASE statements.
Key considerations:
- The clickhousePriceCalc function generates the complete SQL expression
- A default fallback rate is used for unknown models
- The SQL expression handles both legacy provider mappings and registry-based pricing
- Generated expressions are cached and regenerated when pricing data changes