Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:BerriAI Litellm Cache Key Generation

From Leeroopedia
Knowledge Sources Cryptographic hashing theory; cache key design patterns; content-addressable storage principles
Domains Caching, Cryptography, LLM Infrastructure
Last Updated 2026-02-15

Overview

Cache key generation is the process of producing a deterministic, collision-resistant identifier from the parameters of an LLM request so that identical requests map to the same cache entry.

Description

For a response cache to function correctly, every logically identical request must produce the same cache key, and every distinct request must (with overwhelming probability) produce a different key. Cache key generation addresses several challenges unique to LLM API caching:

  • Parameter canonicalization: LLM requests contain many parameters (model, messages, temperature, top_p, tools, etc.). The key generator must decide which parameters are semantically significant -- that is, which parameters would cause a different response if changed -- and include only those in the key.
  • Provider-agnostic model identity: When requests are routed through a load balancer, the same logical model group (e.g., "gpt-4") may map to different physical deployments. The key generator should use the model group rather than the specific deployment identifier to maximise cache hits across equivalent backends.
  • Caching groups: Multiple model groups can be declared as a caching group, meaning they share a cache namespace. This allows responses from one model in the group to serve as cache hits for another.
  • Collision resistance: The raw concatenated parameter string is hashed with SHA-256 to produce a fixed-length, uniformly distributed key that is safe to use as a storage identifier.
  • Namespace isolation: A namespace prefix can be prepended to the hashed key, allowing multiple logical caches to coexist in the same backend without key collisions.
  • Preset key optimisation: Once a cache key is computed, it is stored back into the request's metadata (litellm_params) so that subsequent calls in the same request lifecycle (e.g., post-response cache set) do not repeat the computation.

Usage

Use cache key generation when:

  • You are building or extending a caching layer for LLM API calls and need to define what constitutes a "cache-equivalent" request.
  • You need to understand why two seemingly similar requests produce different cache keys (debugging cache misses).
  • You want to implement cross-model caching groups where responses from one model can serve requests for another.
  • You need to partition cache entries by namespace (e.g., per-tenant, per-environment).

Theoretical Basis

Cache key generation follows the principle of content-addressable storage: the identity of an entry is derived entirely from its content (the request parameters), not from an arbitrary external identifier.

Pseudocode:

FUNCTION get_cache_key(request_params) -> string:
    -- Step 1: Check for a previously computed key (memoisation)
    IF request_params.litellm_params HAS "preset_cache_key":
        RETURN preset_cache_key

    -- Step 2: Build the raw key string from semantically significant parameters
    raw_key = ""
    all_llm_params = get_all_known_llm_api_params()   -- model, messages, temperature, etc.
    internal_params = get_all_internal_litellm_params() -- metadata, caching flags, etc.

    FOR EACH param IN request_params:
        IF param IN all_llm_params:
            value = get_param_value(param, request_params)
            IF value IS NOT NULL:
                raw_key += "{param}: {value}"
        ELSE IF param NOT IN internal_params:
            -- Provider-specific optional param (e.g., top_k)
            IF feature_flag_optional_params_enabled AND value IS NOT NULL:
                raw_key += "{param}: {value}"

    -- Step 3: Hash the raw key with SHA-256
    hashed_key = SHA256(raw_key).hexdigest()

    -- Step 4: Add namespace prefix
    namespace = request_params.cache.namespace
                OR request_params.metadata.redis_namespace
                OR self.namespace
    IF namespace:
        hashed_key = "{namespace}:{hashed_key}"

    -- Step 5: Memoise the computed key
    request_params.litellm_params["preset_cache_key"] = hashed_key

    RETURN hashed_key


FUNCTION get_param_value(param, request_params) -> string:
    IF param == "model":
        -- Prefer caching_group > model_group > raw model name
        caching_group = find_caching_group(metadata, model_group)
        RETURN caching_group OR model_group OR request_params["model"]
    ELSE IF param == "file":
        -- For transcription: use file checksum or name
        RETURN metadata.file_checksum OR file.name OR metadata.file_name
    ELSE:
        RETURN request_params[param]

The key design properties are:

  • Determinism: Given identical input parameters, the same key is always produced, regardless of parameter ordering (since parameters are iterated in a fixed order defined by the known parameter list).
  • Collision resistance: SHA-256 provides a 256-bit hash space (2^256 possible keys), making accidental collisions astronomically unlikely.
  • Selective inclusion: Only parameters that affect the LLM response are included in the key. Internal bookkeeping parameters (API keys, logging objects, metadata) are excluded, preventing false cache misses.
  • O(1) lookup after generation: The fixed-length hexadecimal hash is efficient for use as a dictionary key, Redis key, or S3 object path.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment