Principle:BerriAI Litellm Cache Key Generation
| Knowledge Sources | Cryptographic hashing theory; cache key design patterns; content-addressable storage principles |
|---|---|
| Domains | Caching, Cryptography, LLM Infrastructure |
| Last Updated | 2026-02-15 |
Overview
Cache key generation is the process of producing a deterministic, collision-resistant identifier from the parameters of an LLM request so that identical requests map to the same cache entry.
Description
For a response cache to function correctly, every logically identical request must produce the same cache key, and every distinct request must (with overwhelming probability) produce a different key. Cache key generation addresses several challenges unique to LLM API caching:
- Parameter canonicalization: LLM requests contain many parameters (model, messages, temperature, top_p, tools, etc.). The key generator must decide which parameters are semantically significant -- that is, which parameters would cause a different response if changed -- and include only those in the key.
- Provider-agnostic model identity: When requests are routed through a load balancer, the same logical model group (e.g., "gpt-4") may map to different physical deployments. The key generator should use the model group rather than the specific deployment identifier to maximise cache hits across equivalent backends.
- Caching groups: Multiple model groups can be declared as a caching group, meaning they share a cache namespace. This allows responses from one model in the group to serve as cache hits for another.
- Collision resistance: The raw concatenated parameter string is hashed with SHA-256 to produce a fixed-length, uniformly distributed key that is safe to use as a storage identifier.
- Namespace isolation: A namespace prefix can be prepended to the hashed key, allowing multiple logical caches to coexist in the same backend without key collisions.
- Preset key optimisation: Once a cache key is computed, it is stored back into the request's metadata (
litellm_params) so that subsequent calls in the same request lifecycle (e.g., post-response cache set) do not repeat the computation.
Usage
Use cache key generation when:
- You are building or extending a caching layer for LLM API calls and need to define what constitutes a "cache-equivalent" request.
- You need to understand why two seemingly similar requests produce different cache keys (debugging cache misses).
- You want to implement cross-model caching groups where responses from one model can serve requests for another.
- You need to partition cache entries by namespace (e.g., per-tenant, per-environment).
Theoretical Basis
Cache key generation follows the principle of content-addressable storage: the identity of an entry is derived entirely from its content (the request parameters), not from an arbitrary external identifier.
Pseudocode:
FUNCTION get_cache_key(request_params) -> string:
-- Step 1: Check for a previously computed key (memoisation)
IF request_params.litellm_params HAS "preset_cache_key":
RETURN preset_cache_key
-- Step 2: Build the raw key string from semantically significant parameters
raw_key = ""
all_llm_params = get_all_known_llm_api_params() -- model, messages, temperature, etc.
internal_params = get_all_internal_litellm_params() -- metadata, caching flags, etc.
FOR EACH param IN request_params:
IF param IN all_llm_params:
value = get_param_value(param, request_params)
IF value IS NOT NULL:
raw_key += "{param}: {value}"
ELSE IF param NOT IN internal_params:
-- Provider-specific optional param (e.g., top_k)
IF feature_flag_optional_params_enabled AND value IS NOT NULL:
raw_key += "{param}: {value}"
-- Step 3: Hash the raw key with SHA-256
hashed_key = SHA256(raw_key).hexdigest()
-- Step 4: Add namespace prefix
namespace = request_params.cache.namespace
OR request_params.metadata.redis_namespace
OR self.namespace
IF namespace:
hashed_key = "{namespace}:{hashed_key}"
-- Step 5: Memoise the computed key
request_params.litellm_params["preset_cache_key"] = hashed_key
RETURN hashed_key
FUNCTION get_param_value(param, request_params) -> string:
IF param == "model":
-- Prefer caching_group > model_group > raw model name
caching_group = find_caching_group(metadata, model_group)
RETURN caching_group OR model_group OR request_params["model"]
ELSE IF param == "file":
-- For transcription: use file checksum or name
RETURN metadata.file_checksum OR file.name OR metadata.file_name
ELSE:
RETURN request_params[param]
The key design properties are:
- Determinism: Given identical input parameters, the same key is always produced, regardless of parameter ordering (since parameters are iterated in a fixed order defined by the known parameter list).
- Collision resistance: SHA-256 provides a 256-bit hash space (2^256 possible keys), making accidental collisions astronomically unlikely.
- Selective inclusion: Only parameters that affect the LLM response are included in the key. Internal bookkeeping parameters (API keys, logging objects, metadata) are excluded, preventing false cache misses.
- O(1) lookup after generation: The fixed-length hexadecimal hash is efficient for use as a dictionary key, Redis key, or S3 object path.