Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Helicone Helicone Anthropic Cache Double Count Prevention

From Leeroopedia
Knowledge Sources
Domains Cost_Calculation, Anthropic_Provider
Last Updated 2026-02-14 06:00 GMT

Overview

Anthropic cache write tokens are the sum of 5-minute and 1-hour writes; subtract the sub-categories before applying the default cache write rate to avoid double-counting costs.

Description

Anthropic reports `prompt_cache_write_tokens` as an aggregate that includes both 5-minute TTL writes and 1-hour TTL writes. Each TTL tier has a different cost rate. If you naively multiply the total cache write tokens by the default write rate and then also add the 5m and 1h specific costs, you double-count the tokens that have specific rates. The solution is to subtract the 5m and 1h token counts from the total before applying the default rate, then add back the tier-specific costs.

Usage

Apply this heuristic whenever calculating Anthropic prompt caching costs. This pattern affects the `costOfPrompt` function and any cost aggregation that involves Anthropic cache write tokens.

The Insight (Rule of Thumb)

  • Action: Calculate `effectivePromptCacheWriteTokens = promptCacheWriteTokens - promptCacheWrite5m - promptCacheWrite1h`, then charge each component at its specific rate.
  • Value: The formula ensures: `totalCost = (effective * defaultRate) + (5m * rate5m) + (1h * rate1h)`.
  • Trade-off: If 5m/1h breakdowns are missing (both are 0 or undefined), the full total is charged at the default rate, which is correct since there are no sub-categories to subtract.

Reasoning

Code evidence from `packages/cost/index.ts:101-117`:

// Add cost for cache write tokens if applicable
if (cost.prompt_cache_write_token && promptCacheWriteTokens > 0) {
  // For anthropic requests, the prompt cache write tokens are the sum of the 5m and 1h writes
  // so we subtract to not double count
  const effectivePromptCacheWriteTokens =
    promptCacheWriteTokens -
    (promptCacheWrite5m ?? 0) -
    (promptCacheWrite1h ?? 0);
  totalCost += effectivePromptCacheWriteTokens * cost.prompt_cache_write_token;
  if (cost.prompt_cache_creation_5m && promptCacheWrite5m && promptCacheWrite5m > 0) {
    totalCost += promptCacheWrite5m * cost.prompt_cache_creation_5m;
  }
  if (cost.prompt_cache_creation_1h && promptCacheWrite1h && promptCacheWrite1h > 0) {
    totalCost += promptCacheWrite1h * cost.prompt_cache_creation_1h;
  }
} else if (promptCacheWriteTokens > 0) {
  totalCost += promptCacheWriteTokens * cost.prompt_token;
}

The comment `// For anthropic requests, the prompt cache write tokens are the sum of the 5m and 1h writes // so we subtract to not double count` is the critical insight. Without this subtraction, customers would be overcharged for cache writes.

The fallback branch (`else if`) handles the case where no specific cache write rate exists: tokens are charged at the regular prompt rate.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment