Principle:BerriAI Litellm Cache Initialization

Knowledge Sources	Software design patterns (Factory Method, Strategy); distributed caching architecture; connection management best practices
Domains	Caching, System Configuration, LLM Infrastructure
Last Updated	2026-02-15

Overview

Cache initialization is the process of constructing and configuring a cache subsystem by selecting a backend, establishing connections, and setting operational parameters such as TTL and namespace.

Description

Before any LLM response can be cached or retrieved, the caching subsystem must be properly initialized. Cache initialization solves several problems simultaneously:

Backend instantiation: Based on a configuration enum (e.g., LOCAL, REDIS, S3), the appropriate concrete cache object is created with its backend-specific connection parameters (host, port, credentials, bucket names, etc.).
Connection management: For networked backends like Redis, initialization includes establishing TCP connections, configuring TLS, and optionally setting up cluster topologies with startup nodes.
TTL policy: A default time-to-live (TTL) determines how long cached entries remain valid. Different backends may have distinct default TTLs (in-memory vs. Redis), and the initializer allows per-backend override.
Namespace isolation: A namespace prefix allows multiple logical caches to share a single physical backend without key collisions.
Callback registration: The cache must register itself with the framework's input and success callback pipelines so that cache lookups and stores are triggered at the appropriate lifecycle points.
Mode selection: The cache can operate in "default on" mode (all eligible requests are cached unless explicitly opted out) or "default off" mode (caching is opt-in per request).

A well-designed initializer centralises all of these concerns into a single constructor call, allowing the rest of the application to treat the cache as an opaque, ready-to-use service.

Usage

Use cache initialization when:

You are bootstrapping an LLM proxy server or gateway and need to configure response caching as part of startup.
You need to switch between backends across environments (e.g., in-memory for tests, Redis for staging, S3 for archival in production).
You need to configure advanced features such as Redis cluster mode, semantic similarity thresholds, or GCP IAM-authenticated connections.
You want to restrict caching to specific call types (e.g., only completions and embeddings, but not transcriptions).

Theoretical Basis

Cache initialization follows the Factory Method pattern combined with Strategy delegation. The constructor acts as a factory that examines the requested type and creates the appropriate backend. After construction, all cache operations are dispatched polymorphically through a uniform interface.

Pseudocode:

CLASS Cache:
    CONSTRUCTOR(type, mode, host, port, password, namespace, ttl,
                supported_call_types, backend_specific_params, **kwargs):

        -- Step 1: Select and instantiate the backend
        MATCH type:
            REDIS:
                IF startup_nodes provided:
                    self.cache = RedisClusterBackend(host, port, password, startup_nodes, **kwargs)
                ELSE:
                    self.cache = RedisBackend(host, port, password, **kwargs)
            REDIS_SEMANTIC:
                self.cache = RedisSemanticBackend(host, port, password,
                                                  similarity_threshold, embedding_model, **kwargs)
            QDRANT_SEMANTIC:
                self.cache = QdrantBackend(api_base, api_key, collection, similarity_threshold,
                                           embedding_model)
            LOCAL:
                self.cache = InMemoryBackend()
            S3:
                self.cache = S3Backend(bucket, region, credentials, **kwargs)
            GCS:
                self.cache = GCSBackend(bucket, service_account_path)
            AZURE_BLOB:
                self.cache = AzureBlobBackend(account_url, container)
            DISK:
                self.cache = DiskBackend(directory)

        -- Step 2: Register callbacks
        register_input_callback("cache")
        register_success_callback("cache")
        register_async_success_callback("cache")

        -- Step 3: Store operational parameters
        self.supported_call_types = supported_call_types
        self.type = type
        self.namespace = namespace
        self.ttl = ttl
        self.mode = mode

        -- Step 4: Apply backend-specific TTL overrides
        IF type == LOCAL AND in_memory_ttl IS NOT NULL:
            self.ttl = in_memory_ttl
        IF type IN (REDIS, REDIS_SEMANTIC) AND redis_ttl IS NOT NULL:
            self.ttl = redis_ttl

        -- Step 5: Propagate namespace to backend if supported
        IF namespace AND backend supports namespacing:
            self.cache.namespace = namespace

The key design properties are:

Single responsibility: The constructor is the only place where backend selection and wiring occurs. All other methods operate against the abstract BaseCache interface.
Fail-fast validation: If an unsupported cache type is provided, initialization raises an error immediately rather than deferring failure to runtime cache operations.
Callback auto-registration: By registering callbacks during initialization, the cache transparently hooks into the LLM request lifecycle without requiring manual wiring by the caller.
Immutable configuration: After initialization, the backend type, namespace, and mode are fixed for the lifetime of the cache instance, preventing configuration drift.

Related Pages

Implementation:BerriAI_Litellm_Cache_Init

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment