Workflow:Helicone Helicone Cost Calculation Pipeline

Knowledge Sources	Helicone Helicone Cost Docs
Domains	LLM_Ops, Cost_Tracking, Analytics
Last Updated	2026-02-14 06:00 GMT

Overview

End-to-end process for calculating the monetary cost of LLM API requests using a dual-tier cost system: a legacy provider-based mapping and a new model registry with O(1) lookups, tiered pricing, and ClickHouse SQL generation.

Description

This workflow describes how Helicone calculates the cost of every LLM API request that passes through the platform. The cost package supports two calculation paths: a legacy system that maps provider URLs to flat per-token rates, and a newer model registry system that uses an author/model/endpoint hierarchy with tiered pricing, cache token multipliers, and multi-modality support (text, image, audio, video).

The pipeline begins when a response is received from an LLM provider. A provider-specific UsageProcessor extracts token counts from the response body (each provider returns usage data in a different format). The extracted usage is then matched against the appropriate cost data using either the legacy provider mapping or the model registry. The final cost is computed by multiplying token counts by per-token rates, accounting for pricing tiers based on context window size, cache read/write discounts, and special token types (reasoning tokens, audio tokens). Costs are stored with a precision multiplier of 1 billion to avoid floating-point issues in ClickHouse analytics queries.

Usage

This workflow executes automatically for every LLM request processed by Helicone. It is also used when generating ClickHouse SQL CASE expressions for batch cost aggregation in analytics dashboards, and when users query cost data through the API.

Execution Steps

Step 1: Response Usage Extraction

When a response arrives from an LLM provider, the system selects the appropriate UsageProcessor based on the provider identifier. Each processor knows how to extract token counts from its provider's specific response format. The extracted data is normalized into a unified ModelUsage structure containing prompt tokens, completion tokens, cache read/write tokens, reasoning tokens, and audio tokens.

Key considerations:

OpenAI, Anthropic, Google, Bedrock, DeepSeek, Groq, xAI, OpenRouter, and Vertex each have dedicated UsageProcessors
OpenRouter provides direct cost passthrough, bypassing per-token calculation
Streaming responses accumulate token counts across chunks before final extraction
The getUsageProcessor factory function selects the correct processor based on provider URL pattern

Step 2: Model and Provider Identification

The system identifies the model and provider from the request metadata. For the legacy system, the provider URL is matched against regex patterns in the provider mappings. For the model registry, the model string is parsed to extract model name, optional provider hint, and optional deployment region (e.g., "claude-3.5-haiku/bedrock/us-west-2").

Key considerations:

Model strings support three formats: model-only, model/provider, and model/provider/deployment
The legacy system uses URL pattern matching (regex) to identify providers
The registry system uses direct Map lookups for O(1) model resolution

Step 3: Cost Rate Lookup

The per-token cost rates are retrieved from the cost data source. In the legacy system, this is a flat lookup from the provider's cost array matching the model name. In the registry system, the ModelProviderConfig is resolved and pricing tiers are selected based on the actual context length used in the request.

Key considerations:

The registry supports threshold-based pricing where rates change above certain context lengths
Cache read multipliers typically discount input costs (e.g., 0.1x for 90% savings)
Cache write multipliers may increase costs (e.g., 1.25x for Anthropic cache writes)
Some models have separate pricing for reasoning/thinking tokens

Step 4: Cost Computation

The final cost is computed by multiplying each token count by its corresponding rate. The calculation accounts for: standard prompt tokens, completion tokens, cached prompt read tokens (at discounted rate), cached prompt write tokens, reasoning tokens, audio input/output tokens, image costs, and per-call fees.

Pseudocode:

totalCost = (promptTokens * promptRate)
          + (completionTokens * completionRate)
          + (cacheReadTokens * promptRate * cacheReadMultiplier)
          + (cacheWriteTokens * promptRate * cacheWriteMultiplier)
          + (images * perImageRate)
          + perCallFee

Key considerations:

All costs are stored as integers by multiplying by COST_PRECISION_MULTIPLIER (1 billion)
This avoids floating-point precision issues in ClickHouse aggregation queries
The final displayed cost divides by the precision multiplier

Step 5: ClickHouse Cost SQL Generation

For analytics dashboards that aggregate costs across many requests, the system generates ClickHouse SQL CASE/WHEN expressions. These expressions embed the complete cost lookup logic directly into SQL, enabling the database to compute costs server-side without per-row API calls. The generated SQL maps model/provider combinations to their per-token rates using CASE statements.

Key considerations:

The clickhousePriceCalc function generates the complete SQL expression
A default fallback rate is used for unknown models
The SQL expression handles both legacy provider mappings and registry-based pricing
Generated expressions are cached and regenerated when pricing data changes

Execution Diagram

GitHub URL

Workflow Repository