Principle:Helicone Helicone Endpoint Pricing Configuration

Knowledge Sources	Helicone
Domains	Model Registry, Pricing, LLM Gateway
Last Updated	2026-02-14 00:00 GMT

Overview

Endpoint pricing configuration is a structured declaration that binds a model to a specific provider, defining per-token pricing tiers, cache cost multipliers, regional deployment endpoints, supported parameters, and operational flags for that particular model-provider combination.

Description

A single LLM model (e.g., Claude Opus 4) can be served by multiple providers (Anthropic direct, AWS Bedrock, Google Vertex, OpenRouter), each with different pricing, regional deployments, supported parameters, and operational characteristics. The Endpoint Pricing Configuration principle provides a typed structure for capturing all of these per-model-per-provider details in a single, self-contained declaration.

Each configuration specifies the provider's model ID string (which often differs from the canonical model name), the pricing tiers (with threshold-based tiered rates for input and output tokens), optional cache cost multipliers (for prompt caching features), the context length and maximum completion tokens for that specific deployment, which API parameters are supported, and a map of endpoint deployments (regions or wildcard entries).

The pricing model supports tiered pricing via an array of ModelPricing objects ordered by threshold. This allows representing volume-based pricing where the per-token rate changes after certain usage thresholds. Each tier can include specialized pricing for different modalities (image, audio, video, file), cache read/write multipliers, per-request fees, thinking token costs, and web search costs.

The endpoint configuration map (endpointConfigs) allows per-deployment overrides of pricing, context length, and model IDs, which is essential for providers like AWS Bedrock and Google Vertex where different regions may have different pricing or model availability.

Usage

Use this configuration structure when:

Adding a new model to the gateway for a specific provider
Defining or updating pricing for a model-provider combination
Configuring regional deployments with per-region overrides
Declaring which API parameters a model-provider combination supports

Theoretical Basis

The Endpoint Pricing Configuration follows the Configuration as Data principle: operational parameters that change per model-provider-region are expressed as declarative data structures rather than imperative code. This makes configurations auditable, diffable, and testable.

The tiered pricing array implements a step function: given a usage quantity, the system finds the tier whose threshold the quantity exceeds and applies that tier's rates. This is a common pattern in billing systems:

function getPricingTier(usage, pricingTiers):
    applicableTier = pricingTiers[0]
    for tier in pricingTiers:
        if usage >= tier.threshold:
            applicableTier = tier
    return applicableTier

The Composite Key pattern ("model-name:provider-name") creates a unique identifier for each model-provider combination, enabling O(1) lookup in the registry. The endpoint configs within each entry add a third dimension ("model:provider:region"), forming a hierarchical key structure.

Cache multipliers follow a Derived Pricing pattern: rather than specifying absolute cache costs, they express cache costs as a fraction of the base input rate (e.g., cachedInput: 0.1 means 10% of the input rate), ensuring cache pricing stays proportional when base rates change.

Related Pages

Implemented By

Implementation:Helicone_Helicone_ModelProviderConfig

Uses Heuristic

Heuristic:Helicone_Helicone_Tiered_Pricing_Threshold_Matching

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment