Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Helicone Helicone Cost Computation

From Leeroopedia
Knowledge Sources
Domains Cost Calculation, LLM Observability, Financial Analytics
Last Updated 2026-02-14 00:00 GMT

Overview

Computing the total USD cost of an LLM request by multiplying extracted token counts against their corresponding per-token rates, accounting for multiple token categories and pricing tiers.

Description

Cost Computation is the final arithmetic stage of the cost calculation pipeline. After token usage has been extracted (via usage processors) and per-token rates have been resolved (via rate lookup), this stage multiplies each token category by its rate and sums the results to produce a total cost in USD.

Modern LLM pricing is not a simple "tokens times rate" calculation. A single request may involve multiple token categories, each with a different rate: regular prompt tokens, cached prompt tokens (read), cache write tokens (with provider-specific tiers such as Anthropic's 5-minute and 1-hour cache creation), completion tokens, thinking/reasoning tokens, audio input/output tokens, image tokens, video tokens, file tokens, web search operations, and per-call fixed fees. The cost computation logic must handle all of these categories, falling back to the base prompt or completion rate when a specialized rate is not defined for a given category.

Helicone maintains a dual computation system. The legacy system (costOfPrompt) uses a flat rate structure from the provider mappings registry, applying simple token-times-rate multiplication. The new system (modelCostBreakdownFromRegistry) supports tiered pricing where rates change based on token volume thresholds (e.g., Google Gemini charges different rates above 128K input tokens), per-modality cost breakdowns, and returns a detailed CostBreakdown object rather than a single number. The new system also handles provider-specific threshold logic -- for example, Anthropic's pricing tiers are based on total prompt length (input + cached + cache writes), while Vertex's tiers are based on just the input token count.

Usage

Use this pattern when:

  • Computing the dollar cost of a completed LLM request given its token usage.
  • Building cost dashboards that show per-request cost breakdowns.
  • Comparing costs across providers for equivalent requests.
  • Implementing budget alerts or cost tracking for LLM applications.

Theoretical Basis

Cost Computation implements a weighted linear combination where each token category contributes count_i * rate_i to the total. The tiered pricing extension adds a step function where the rate depends on the total volume, making it a piecewise linear cost model.

The dual system follows the Strangler Fig migration pattern -- the new tiered system coexists with the legacy flat-rate system, allowing incremental migration of providers to the richer pricing model without disrupting existing calculations.

The CostBreakdown return type embodies the Itemized Receipt pattern, where individual line items (input cost, output cost, cache costs, thinking cost, modality costs, etc.) are preserved alongside the total, enabling detailed cost attribution and analysis.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment