Principle:Helicone Helicone Endpoint Pricing Configuration
| Knowledge Sources | |
|---|---|
| Domains | Model Registry, Pricing, LLM Gateway |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Endpoint pricing configuration is a structured declaration that binds a model to a specific provider, defining per-token pricing tiers, cache cost multipliers, regional deployment endpoints, supported parameters, and operational flags for that particular model-provider combination.
Description
A single LLM model (e.g., Claude Opus 4) can be served by multiple providers (Anthropic direct, AWS Bedrock, Google Vertex, OpenRouter), each with different pricing, regional deployments, supported parameters, and operational characteristics. The Endpoint Pricing Configuration principle provides a typed structure for capturing all of these per-model-per-provider details in a single, self-contained declaration.
Each configuration specifies the provider's model ID string (which often differs from the canonical model name), the pricing tiers (with threshold-based tiered rates for input and output tokens), optional cache cost multipliers (for prompt caching features), the context length and maximum completion tokens for that specific deployment, which API parameters are supported, and a map of endpoint deployments (regions or wildcard entries).
The pricing model supports tiered pricing via an array of ModelPricing objects ordered by threshold. This allows representing volume-based pricing where the per-token rate changes after certain usage thresholds. Each tier can include specialized pricing for different modalities (image, audio, video, file), cache read/write multipliers, per-request fees, thinking token costs, and web search costs.
The endpoint configuration map (endpointConfigs) allows per-deployment overrides of pricing, context length, and model IDs, which is essential for providers like AWS Bedrock and Google Vertex where different regions may have different pricing or model availability.
Usage
Use this configuration structure when:
- Adding a new model to the gateway for a specific provider
- Defining or updating pricing for a model-provider combination
- Configuring regional deployments with per-region overrides
- Declaring which API parameters a model-provider combination supports
Theoretical Basis
The Endpoint Pricing Configuration follows the Configuration as Data principle: operational parameters that change per model-provider-region are expressed as declarative data structures rather than imperative code. This makes configurations auditable, diffable, and testable.
The tiered pricing array implements a step function: given a usage quantity, the system finds the tier whose threshold the quantity exceeds and applies that tier's rates. This is a common pattern in billing systems:
function getPricingTier(usage, pricingTiers):
applicableTier = pricingTiers[0]
for tier in pricingTiers:
if usage >= tier.threshold:
applicableTier = tier
return applicableTier
The Composite Key pattern ("model-name:provider-name") creates a unique identifier for each model-provider combination, enabling O(1) lookup in the registry. The endpoint configs within each entry add a third dimension ("model:provider:region"), forming a hierarchical key structure.
Cache multipliers follow a Derived Pricing pattern: rather than specifying absolute cache costs, they express cache costs as a fraction of the base input rate (e.g., cachedInput: 0.1 means 10% of the input rate), ensuring cache pricing stays proportional when base rates change.