Implementation:BerriAI Litellm Cache Init

Knowledge Sources	https://github.com/BerriAI/litellm
Domains	Caching, System Configuration, LLM Infrastructure
Last Updated	2026-02-15

Overview

Concrete constructor for initializing and configuring the LiteLLM response caching subsystem provided by the Cache class.

Description

The Cache.__init__ method is the central entry point for setting up LiteLLM's response caching. Based on the type parameter (a LiteLLMCacheType enum), it instantiates one of eight concrete backend implementations: InMemoryCache, RedisCache, RedisClusterCache, RedisSemanticCache, QdrantSemanticCache, S3Cache, GCSCache, AzureBlobCache, or DiskCache. It then registers the cache with LiteLLM's callback system (input, success, and async success callbacks), stores operational parameters (TTL, namespace, supported call types, mode), and applies backend-specific TTL overrides. For Redis backends, it also propagates the namespace to the underlying cache object. The mode parameter controls whether caching is on by default (default_on) or opt-in (default_off).

Usage

Import and instantiate the Cache class at application startup. Assign it to litellm.cache to enable response caching globally across all LiteLLM API calls.

Code Reference

Source Location	`litellm/caching/caching.py`, lines 56-264
Signature	Cache.__init__(self, type=LiteLLMCacheType.LOCAL, mode=CacheMode.default_on, host=None, port=None, password=None, namespace=None, ttl=None, default_in_memory_ttl=None, default_in_redis_ttl=None, similarity_threshold=None, supported_call_types=[...], azure_account_url=None, azure_blob_container=None, s3_bucket_name=None, s3_region_name=None, s3_api_version=None, s3_use_ssl=True, s3_verify=None, s3_endpoint_url=None, s3_aws_access_key_id=None, s3_aws_secret_access_key=None, s3_aws_session_token=None, s3_config=None, s3_path=None, gcs_bucket_name=None, gcs_path_service_account=None, gcs_path=None, redis_semantic_cache_embedding_model="text-embedding-ada-002", redis_semantic_cache_index_name=None, redis_flush_size=None, redis_startup_nodes=None, disk_cache_dir=None, qdrant_api_base=None, qdrant_api_key=None, qdrant_collection_name=None, qdrant_quantization_config=None, qdrant_semantic_cache_embedding_model="text-embedding-ada-002", gcp_service_account=None, gcp_ssl_ca_certs=None, **kwargs)
Import	`from litellm.caching.caching import Cache`

I/O Contract

Inputs:

Parameter	Type	Default	Description
`type`	`Optional[LiteLLMCacheType]`	`LiteLLMCacheType.LOCAL`	The cache backend to use
`mode`	`Optional[CacheMode]`	`CacheMode.default_on`	Whether caching is on by default or opt-in
`host`	`Optional[str]`	`None`	Redis host address
`port`	`Optional[str]`	`None`	Redis port number
`password`	`Optional[str]`	`None`	Redis password
`namespace`	`Optional[str]`	`None`	Namespace prefix for cache keys
`ttl`	`Optional[float]`	`None`	Default time-to-live in seconds for cached entries
`default_in_memory_ttl`	`Optional[float]`	`None`	TTL override for in-memory (LOCAL) cache
`default_in_redis_ttl`	`Optional[float]`	`None`	TTL override for Redis-based caches
`similarity_threshold`	`Optional[float]`	`None`	Cosine similarity threshold for semantic caching
`supported_call_types`	`Optional[List[CachingSupportedCallTypes]]`	All call types	Which LLM call types to cache
`redis_startup_nodes`	`Optional[List]`	`None`	Startup nodes for Redis Cluster mode
`redis_flush_size`	`Optional[int]`	`None`	Number of keys to flush at a time in batch operations
`s3_bucket_name`	`Optional[str]`	`None`	S3 bucket name for S3 cache
`gcs_bucket_name`	`Optional[str]`	`None`	GCS bucket name for GCS cache
`disk_cache_dir`	`Optional[str]`	`None`	Directory for disk-based cache
`**kwargs`	`Any`	--	Additional keyword arguments passed to the underlying cache backend constructor (e.g., `redis.Redis()` kwargs)

Outputs:

Return Type	Description
`None`	The constructor modifies the instance in place. After initialization, `self.cache` holds the concrete backend, and LiteLLM callbacks are registered.

Usage Examples

Basic in-memory cache (default):

import litellm
from litellm.caching.caching import Cache

# Simplest initialization -- uses in-memory cache
litellm.cache = Cache()

Redis cache with custom TTL and namespace:

import litellm
from litellm.caching.caching import Cache
from litellm.types.caching import LiteLLMCacheType

litellm.cache = Cache(
    type=LiteLLMCacheType.REDIS,
    host="redis.example.com",
    port="6379",
    password="my-secret",
    namespace="prod-v1",
    ttl=600.0,  # 10 minutes
)

Redis Cluster with GCP IAM authentication:

import litellm
from litellm.caching.caching import Cache
from litellm.types.caching import LiteLLMCacheType

litellm.cache = Cache(
    type=LiteLLMCacheType.REDIS,
    host="redis-cluster.internal",
    port="6379",
    password="cluster-pass",
    redis_startup_nodes=[
        {"host": "node1.internal", "port": "6379"},
        {"host": "node2.internal", "port": "6379"},
    ],
    gcp_service_account="/path/to/sa.json",
    gcp_ssl_ca_certs="/path/to/ca.pem",
)

Semantic cache with Redis and a similarity threshold:

import litellm
from litellm.caching.caching import Cache
from litellm.types.caching import LiteLLMCacheType

litellm.cache = Cache(
    type=LiteLLMCacheType.REDIS_SEMANTIC,
    host="redis.example.com",
    port="6379",
    password="my-secret",
    similarity_threshold=0.8,
    redis_semantic_cache_embedding_model="text-embedding-ada-002",
)

S3 cache with specific call types:

import litellm
from litellm.caching.caching import Cache
from litellm.types.caching import LiteLLMCacheType

litellm.cache = Cache(
    type=LiteLLMCacheType.S3,
    s3_bucket_name="my-llm-cache",
    s3_region_name="us-east-1",
    s3_path="cache/v1/",
    supported_call_types=["completion", "acompletion"],
    ttl=3600.0,
)

Opt-in caching mode (default off):

import litellm
from litellm.caching.caching import Cache, CacheMode
from litellm.types.caching import LiteLLMCacheType

litellm.cache = Cache(
    type=LiteLLMCacheType.LOCAL,
    mode=CacheMode.default_off,
)
# Now caching only occurs when the caller explicitly sets caching=True

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment