Implementation:BerriAI Litellm Redis Semantic Cache
| Attribute | Value |
|---|---|
| Sources | litellm/caching/redis_semantic_cache.py
|
| Domains | Caching, Semantic Search, Redis, Embeddings, Vector Search |
| Last Updated | 2026-02-15 16:00 GMT |
Overview
The RedisSemanticCache provides semantic caching of LLM responses using Redis vector similarity search, enabling cache hits for prompts that are semantically similar rather than requiring exact string matches.
Description
RedisSemanticCache extends BaseCache and uses RedisVL's SemanticCache to store and retrieve cached responses based on cosine similarity of prompt embeddings. When a cache entry is stored, the prompt text is embedded using a configurable embedding model and stored alongside the response in a Redis vector index. On retrieval, the incoming prompt is embedded and compared against stored vectors; if the similarity exceeds the configured threshold, the cached response is returned. The class converts between similarity scores (0-1, where 1 is most similar) and cosine distance values (0-2, where 0 is most similar) internally. Both sync and async operations are supported, with async operations routing embedding generation through the LiteLLM proxy router when available. TTL support is included for cache entry expiration.
Usage
Import RedisSemanticCache when you want to cache LLM responses based on semantic similarity of prompts. Requires the redisvl package and a running Redis instance with the RediSearch module.
Code Reference
Source Location
litellm/caching/redis_semantic_cache.py
Signature
class RedisSemanticCache(BaseCache):
DEFAULT_REDIS_INDEX_NAME: str = "litellm_semantic_cache_index"
def __init__(
self,
host: Optional[str] = None,
port: Optional[str] = None,
password: Optional[str] = None,
redis_url: Optional[str] = None,
similarity_threshold: Optional[float] = None,
embedding_model: str = "text-embedding-ada-002",
index_name: Optional[str] = None,
**kwargs,
)
Import
from litellm.caching.redis_semantic_cache import RedisSemanticCache
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
host |
Optional[str] |
No | Redis host. Falls back to REDIS_HOST env var.
|
port |
Optional[str] |
No | Redis port. Falls back to REDIS_PORT env var.
|
password |
Optional[str] |
No | Redis password. Falls back to REDIS_PASSWORD env var.
|
redis_url |
Optional[str] |
No | Full Redis URL (alternative to host/port/password). |
similarity_threshold |
Optional[float] |
Yes | Similarity threshold (0.0-1.0). 1.0 = exact match only, 0.0 = accept any match. |
embedding_model |
str |
No | Model for generating embeddings. Defaults to "text-embedding-ada-002".
|
index_name |
Optional[str] |
No | Redis index name. Defaults to "litellm_semantic_cache_index".
|
Key Methods
| Method | Returns | Description |
|---|---|---|
set_cache(key, value, **kwargs) |
None |
Stores a response with its prompt embedding in the semantic cache. |
get_cache(key, **kwargs) |
Any |
Retrieves a semantically similar cached response. |
async_set_cache(key, value, **kwargs) |
None |
Async version of set_cache with async embedding generation.
|
async_get_cache(key, **kwargs) |
Any |
Async version of get_cache with async embedding generation. Updates kwargs["metadata"]["semantic-similarity"].
|
async_set_cache_pipeline(cache_list, **kwargs) |
None |
Batch async store of multiple cache entries. |
Outputs
| Output | Type | Description |
|---|---|---|
| Cached response | Any |
The cached LLM response if a semantically similar prompt is found, otherwise None.
|
Usage Examples
from litellm.caching.redis_semantic_cache import RedisSemanticCache
cache = RedisSemanticCache(
redis_url="redis://:password@localhost:6379",
similarity_threshold=0.8,
embedding_model="text-embedding-ada-002",
)
# Store a cache entry
cache.set_cache(
key="cache-key",
value={"response": "Hello!"},
messages=[{"role": "user", "content": "Hi there!"}],
)
# Retrieve by semantic similarity
result = cache.get_cache(
key="cache-key",
messages=[{"role": "user", "content": "Hey!"}],
)
# Returns the cached response if "Hey!" is semantically similar to "Hi there!"
# Async usage
result = await cache.async_get_cache(
key="cache-key",
messages=[{"role": "user", "content": "Hello, how are you?"}],
metadata={}
)
# metadata["semantic-similarity"] will be updated with the similarity score
Related Pages
- BerriAI_Litellm_Dual_Cache - the two-tier in-memory + Redis caching implementation