Implementation:BerriAI Litellm Cache Init
| Knowledge Sources | https://github.com/BerriAI/litellm |
|---|---|
| Domains | Caching, System Configuration, LLM Infrastructure |
| Last Updated | 2026-02-15 |
Overview
Concrete constructor for initializing and configuring the LiteLLM response caching subsystem provided by the Cache class.
Description
The Cache.__init__ method is the central entry point for setting up LiteLLM's response caching. Based on the type parameter (a LiteLLMCacheType enum), it instantiates one of eight concrete backend implementations: InMemoryCache, RedisCache, RedisClusterCache, RedisSemanticCache, QdrantSemanticCache, S3Cache, GCSCache, AzureBlobCache, or DiskCache. It then registers the cache with LiteLLM's callback system (input, success, and async success callbacks), stores operational parameters (TTL, namespace, supported call types, mode), and applies backend-specific TTL overrides. For Redis backends, it also propagates the namespace to the underlying cache object. The mode parameter controls whether caching is on by default (default_on) or opt-in (default_off).
Usage
Import and instantiate the Cache class at application startup. Assign it to litellm.cache to enable response caching globally across all LiteLLM API calls.
Code Reference
| Source Location | litellm/caching/caching.py, lines 56-264
|
|---|---|
| Signature | Cache.__init__(self, type=LiteLLMCacheType.LOCAL, mode=CacheMode.default_on, host=None, port=None, password=None, namespace=None, ttl=None, default_in_memory_ttl=None, default_in_redis_ttl=None, similarity_threshold=None, supported_call_types=[...], azure_account_url=None, azure_blob_container=None, s3_bucket_name=None, s3_region_name=None, s3_api_version=None, s3_use_ssl=True, s3_verify=None, s3_endpoint_url=None, s3_aws_access_key_id=None, s3_aws_secret_access_key=None, s3_aws_session_token=None, s3_config=None, s3_path=None, gcs_bucket_name=None, gcs_path_service_account=None, gcs_path=None, redis_semantic_cache_embedding_model="text-embedding-ada-002", redis_semantic_cache_index_name=None, redis_flush_size=None, redis_startup_nodes=None, disk_cache_dir=None, qdrant_api_base=None, qdrant_api_key=None, qdrant_collection_name=None, qdrant_quantization_config=None, qdrant_semantic_cache_embedding_model="text-embedding-ada-002", gcp_service_account=None, gcp_ssl_ca_certs=None, **kwargs)
|
| Import | from litellm.caching.caching import Cache
|
I/O Contract
Inputs:
| Parameter | Type | Default | Description |
|---|---|---|---|
type |
Optional[LiteLLMCacheType] |
LiteLLMCacheType.LOCAL |
The cache backend to use |
mode |
Optional[CacheMode] |
CacheMode.default_on |
Whether caching is on by default or opt-in |
host |
Optional[str] |
None |
Redis host address |
port |
Optional[str] |
None |
Redis port number |
password |
Optional[str] |
None |
Redis password |
namespace |
Optional[str] |
None |
Namespace prefix for cache keys |
ttl |
Optional[float] |
None |
Default time-to-live in seconds for cached entries |
default_in_memory_ttl |
Optional[float] |
None |
TTL override for in-memory (LOCAL) cache |
default_in_redis_ttl |
Optional[float] |
None |
TTL override for Redis-based caches |
similarity_threshold |
Optional[float] |
None |
Cosine similarity threshold for semantic caching |
supported_call_types |
Optional[List[CachingSupportedCallTypes]] |
All call types | Which LLM call types to cache |
redis_startup_nodes |
Optional[List] |
None |
Startup nodes for Redis Cluster mode |
redis_flush_size |
Optional[int] |
None |
Number of keys to flush at a time in batch operations |
s3_bucket_name |
Optional[str] |
None |
S3 bucket name for S3 cache |
gcs_bucket_name |
Optional[str] |
None |
GCS bucket name for GCS cache |
disk_cache_dir |
Optional[str] |
None |
Directory for disk-based cache |
**kwargs |
Any |
-- | Additional keyword arguments passed to the underlying cache backend constructor (e.g., redis.Redis() kwargs)
|
Outputs:
| Return Type | Description |
|---|---|
None |
The constructor modifies the instance in place. After initialization, self.cache holds the concrete backend, and LiteLLM callbacks are registered.
|
Usage Examples
Basic in-memory cache (default):
import litellm
from litellm.caching.caching import Cache
# Simplest initialization -- uses in-memory cache
litellm.cache = Cache()
Redis cache with custom TTL and namespace:
import litellm
from litellm.caching.caching import Cache
from litellm.types.caching import LiteLLMCacheType
litellm.cache = Cache(
type=LiteLLMCacheType.REDIS,
host="redis.example.com",
port="6379",
password="my-secret",
namespace="prod-v1",
ttl=600.0, # 10 minutes
)
Redis Cluster with GCP IAM authentication:
import litellm
from litellm.caching.caching import Cache
from litellm.types.caching import LiteLLMCacheType
litellm.cache = Cache(
type=LiteLLMCacheType.REDIS,
host="redis-cluster.internal",
port="6379",
password="cluster-pass",
redis_startup_nodes=[
{"host": "node1.internal", "port": "6379"},
{"host": "node2.internal", "port": "6379"},
],
gcp_service_account="/path/to/sa.json",
gcp_ssl_ca_certs="/path/to/ca.pem",
)
Semantic cache with Redis and a similarity threshold:
import litellm
from litellm.caching.caching import Cache
from litellm.types.caching import LiteLLMCacheType
litellm.cache = Cache(
type=LiteLLMCacheType.REDIS_SEMANTIC,
host="redis.example.com",
port="6379",
password="my-secret",
similarity_threshold=0.8,
redis_semantic_cache_embedding_model="text-embedding-ada-002",
)
S3 cache with specific call types:
import litellm
from litellm.caching.caching import Cache
from litellm.types.caching import LiteLLMCacheType
litellm.cache = Cache(
type=LiteLLMCacheType.S3,
s3_bucket_name="my-llm-cache",
s3_region_name="us-east-1",
s3_path="cache/v1/",
supported_call_types=["completion", "acompletion"],
ttl=3600.0,
)
Opt-in caching mode (default off):
import litellm
from litellm.caching.caching import Cache, CacheMode
from litellm.types.caching import LiteLLMCacheType
litellm.cache = Cache(
type=LiteLLMCacheType.LOCAL,
mode=CacheMode.default_off,
)
# Now caching only occurs when the caller explicitly sets caching=True