Principle:Helicone Helicone Central Registry Indexing

Knowledge Sources	Helicone
Domains	Model Registry, Indexing, Performance Optimization
Last Updated	2026-02-14 00:00 GMT

Overview

Central registry indexing is the technique of pre-computing multiple lookup maps from a flat collection of model-provider configuration records, enabling O(1) access by model name, provider, endpoint ID, or composite key at runtime.

Description

An LLM gateway handles many models across many providers, each with multiple regional deployments. At request time, the gateway needs to answer questions like "which providers serve model X?", "what are the cheapest endpoints for model Y?", "what is the configuration for this specific model:provider:region triple?", or "which models does provider Z offer?". If these questions were answered by scanning the full configuration set on every request, the latency would be unacceptable for a high-throughput proxy.

Central Registry Indexing addresses this by building a set of pre-computed Map objects at startup time. Each Map provides O(1) access for a specific query pattern. The build process iterates over all model-provider configurations once, populating every index in a single pass. The resulting index set is immutable after construction and can be safely shared across all request handlers.

A key design choice is that endpoints within each index are sorted by cost (ascending sum of input and output token rates). This means the first endpoint in any list is always the cheapest option, which is useful for cost-optimized routing without runtime sorting.

Usage

Use central registry indexing when:

The gateway starts up and needs to prepare lookup tables for request routing
You need to find all providers offering a particular model
You need to resolve a specific model:provider:region endpoint configuration
You need to identify all PTB-enabled (Provider Token Billing) endpoints for a model
You need to build a cost-sorted list of available endpoints for routing decisions

Theoretical Basis

This principle applies Materialized View optimization from database systems. Rather than computing query results on demand (the equivalent of a table scan), the system pre-computes and stores the results of common query patterns (the equivalent of materialized indexes). The trade-off is increased memory usage and startup time in exchange for O(1) runtime lookups.

The build process implements a Single-Pass Multi-Index Construction algorithm:

function buildIndexes(configs):
    indexes = initializeEmptyMaps()

    for each (compositeKey, config) in configs:
        (modelName, provider) = split(compositeKey, ":")

        indexes.configById[compositeKey] = config
        indexes.providerToModels[provider].add(modelName)
        indexes.modelToProviders[modelName].add(provider)
        indexes.modelToConfigs[modelName].append(config)

        for each (deploymentId, deploymentConfig) in config.endpoints:
            endpoint = mergeConfigs(config, deploymentConfig, deploymentId)
            endpointKey = compositeKey + ":" + deploymentId
            indexes.endpointById[endpointKey] = endpoint
            indexes.modelToEndpoints[modelName].append(endpoint)

            if endpoint.ptbEnabled:
                indexes.modelToPtbEndpoints[modelName].append(endpoint)

    sortAllEndpointListsByCost(indexes)
    return indexes

The cost-based sorting applies a Greedy Optimization heuristic: by keeping endpoints sorted by cost, consumers can always pick the first available endpoint for the cheapest option without additional computation.

Related Pages

Implemented By

Implementation:Helicone_Helicone_BuildIndexes

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment