Principle:BerriAI Litellm Cache Initialization
| Knowledge Sources | Software design patterns (Factory Method, Strategy); distributed caching architecture; connection management best practices |
|---|---|
| Domains | Caching, System Configuration, LLM Infrastructure |
| Last Updated | 2026-02-15 |
Overview
Cache initialization is the process of constructing and configuring a cache subsystem by selecting a backend, establishing connections, and setting operational parameters such as TTL and namespace.
Description
Before any LLM response can be cached or retrieved, the caching subsystem must be properly initialized. Cache initialization solves several problems simultaneously:
- Backend instantiation: Based on a configuration enum (e.g., LOCAL, REDIS, S3), the appropriate concrete cache object is created with its backend-specific connection parameters (host, port, credentials, bucket names, etc.).
- Connection management: For networked backends like Redis, initialization includes establishing TCP connections, configuring TLS, and optionally setting up cluster topologies with startup nodes.
- TTL policy: A default time-to-live (TTL) determines how long cached entries remain valid. Different backends may have distinct default TTLs (in-memory vs. Redis), and the initializer allows per-backend override.
- Namespace isolation: A namespace prefix allows multiple logical caches to share a single physical backend without key collisions.
- Callback registration: The cache must register itself with the framework's input and success callback pipelines so that cache lookups and stores are triggered at the appropriate lifecycle points.
- Mode selection: The cache can operate in "default on" mode (all eligible requests are cached unless explicitly opted out) or "default off" mode (caching is opt-in per request).
A well-designed initializer centralises all of these concerns into a single constructor call, allowing the rest of the application to treat the cache as an opaque, ready-to-use service.
Usage
Use cache initialization when:
- You are bootstrapping an LLM proxy server or gateway and need to configure response caching as part of startup.
- You need to switch between backends across environments (e.g., in-memory for tests, Redis for staging, S3 for archival in production).
- You need to configure advanced features such as Redis cluster mode, semantic similarity thresholds, or GCP IAM-authenticated connections.
- You want to restrict caching to specific call types (e.g., only completions and embeddings, but not transcriptions).
Theoretical Basis
Cache initialization follows the Factory Method pattern combined with Strategy delegation. The constructor acts as a factory that examines the requested type and creates the appropriate backend. After construction, all cache operations are dispatched polymorphically through a uniform interface.
Pseudocode:
CLASS Cache:
CONSTRUCTOR(type, mode, host, port, password, namespace, ttl,
supported_call_types, backend_specific_params, **kwargs):
-- Step 1: Select and instantiate the backend
MATCH type:
REDIS:
IF startup_nodes provided:
self.cache = RedisClusterBackend(host, port, password, startup_nodes, **kwargs)
ELSE:
self.cache = RedisBackend(host, port, password, **kwargs)
REDIS_SEMANTIC:
self.cache = RedisSemanticBackend(host, port, password,
similarity_threshold, embedding_model, **kwargs)
QDRANT_SEMANTIC:
self.cache = QdrantBackend(api_base, api_key, collection, similarity_threshold,
embedding_model)
LOCAL:
self.cache = InMemoryBackend()
S3:
self.cache = S3Backend(bucket, region, credentials, **kwargs)
GCS:
self.cache = GCSBackend(bucket, service_account_path)
AZURE_BLOB:
self.cache = AzureBlobBackend(account_url, container)
DISK:
self.cache = DiskBackend(directory)
-- Step 2: Register callbacks
register_input_callback("cache")
register_success_callback("cache")
register_async_success_callback("cache")
-- Step 3: Store operational parameters
self.supported_call_types = supported_call_types
self.type = type
self.namespace = namespace
self.ttl = ttl
self.mode = mode
-- Step 4: Apply backend-specific TTL overrides
IF type == LOCAL AND in_memory_ttl IS NOT NULL:
self.ttl = in_memory_ttl
IF type IN (REDIS, REDIS_SEMANTIC) AND redis_ttl IS NOT NULL:
self.ttl = redis_ttl
-- Step 5: Propagate namespace to backend if supported
IF namespace AND backend supports namespacing:
self.cache.namespace = namespace
The key design properties are:
- Single responsibility: The constructor is the only place where backend selection and wiring occurs. All other methods operate against the abstract
BaseCacheinterface. - Fail-fast validation: If an unsupported cache type is provided, initialization raises an error immediately rather than deferring failure to runtime cache operations.
- Callback auto-registration: By registering callbacks during initialization, the cache transparently hooks into the LLM request lifecycle without requiring manual wiring by the caller.
- Immutable configuration: After initialization, the backend type, namespace, and mode are fixed for the lifetime of the cache instance, preventing configuration drift.