Principle:Helicone Helicone Central Registry Indexing
| Knowledge Sources | |
|---|---|
| Domains | Model Registry, Indexing, Performance Optimization |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Central registry indexing is the technique of pre-computing multiple lookup maps from a flat collection of model-provider configuration records, enabling O(1) access by model name, provider, endpoint ID, or composite key at runtime.
Description
An LLM gateway handles many models across many providers, each with multiple regional deployments. At request time, the gateway needs to answer questions like "which providers serve model X?", "what are the cheapest endpoints for model Y?", "what is the configuration for this specific model:provider:region triple?", or "which models does provider Z offer?". If these questions were answered by scanning the full configuration set on every request, the latency would be unacceptable for a high-throughput proxy.
Central Registry Indexing addresses this by building a set of pre-computed Map objects at startup time. Each Map provides O(1) access for a specific query pattern. The build process iterates over all model-provider configurations once, populating every index in a single pass. The resulting index set is immutable after construction and can be safely shared across all request handlers.
A key design choice is that endpoints within each index are sorted by cost (ascending sum of input and output token rates). This means the first endpoint in any list is always the cheapest option, which is useful for cost-optimized routing without runtime sorting.
Usage
Use central registry indexing when:
- The gateway starts up and needs to prepare lookup tables for request routing
- You need to find all providers offering a particular model
- You need to resolve a specific model:provider:region endpoint configuration
- You need to identify all PTB-enabled (Provider Token Billing) endpoints for a model
- You need to build a cost-sorted list of available endpoints for routing decisions
Theoretical Basis
This principle applies Materialized View optimization from database systems. Rather than computing query results on demand (the equivalent of a table scan), the system pre-computes and stores the results of common query patterns (the equivalent of materialized indexes). The trade-off is increased memory usage and startup time in exchange for O(1) runtime lookups.
The build process implements a Single-Pass Multi-Index Construction algorithm:
function buildIndexes(configs):
indexes = initializeEmptyMaps()
for each (compositeKey, config) in configs:
(modelName, provider) = split(compositeKey, ":")
indexes.configById[compositeKey] = config
indexes.providerToModels[provider].add(modelName)
indexes.modelToProviders[modelName].add(provider)
indexes.modelToConfigs[modelName].append(config)
for each (deploymentId, deploymentConfig) in config.endpoints:
endpoint = mergeConfigs(config, deploymentConfig, deploymentId)
endpointKey = compositeKey + ":" + deploymentId
indexes.endpointById[endpointKey] = endpoint
indexes.modelToEndpoints[modelName].append(endpoint)
if endpoint.ptbEnabled:
indexes.modelToPtbEndpoints[modelName].append(endpoint)
sortAllEndpointListsByCost(indexes)
return indexes
The cost-based sorting applies a Greedy Optimization heuristic: by keeping endpoints sorted by cost, consumers can always pick the first available endpoint for the cheapest option without additional computation.