Implementation:BerriAI Litellm Redis Semantic Cache

Attribute	Value
Sources	`litellm/caching/redis_semantic_cache.py`
Domains	Caching, Semantic Search, Redis, Embeddings, Vector Search
Last Updated	2026-02-15 16:00 GMT

Overview

The RedisSemanticCache provides semantic caching of LLM responses using Redis vector similarity search, enabling cache hits for prompts that are semantically similar rather than requiring exact string matches.

Description

RedisSemanticCache extends BaseCache and uses RedisVL's SemanticCache to store and retrieve cached responses based on cosine similarity of prompt embeddings. When a cache entry is stored, the prompt text is embedded using a configurable embedding model and stored alongside the response in a Redis vector index. On retrieval, the incoming prompt is embedded and compared against stored vectors; if the similarity exceeds the configured threshold, the cached response is returned. The class converts between similarity scores (0-1, where 1 is most similar) and cosine distance values (0-2, where 0 is most similar) internally. Both sync and async operations are supported, with async operations routing embedding generation through the LiteLLM proxy router when available. TTL support is included for cache entry expiration.

Usage

Import RedisSemanticCache when you want to cache LLM responses based on semantic similarity of prompts. Requires the redisvl package and a running Redis instance with the RediSearch module.

Code Reference

Source Location

litellm/caching/redis_semantic_cache.py

Signature

class RedisSemanticCache(BaseCache):
    DEFAULT_REDIS_INDEX_NAME: str = "litellm_semantic_cache_index"

    def __init__(
        self,
        host: Optional[str] = None,
        port: Optional[str] = None,
        password: Optional[str] = None,
        redis_url: Optional[str] = None,
        similarity_threshold: Optional[float] = None,
        embedding_model: str = "text-embedding-ada-002",
        index_name: Optional[str] = None,
        **kwargs,
    )

Import

from litellm.caching.redis_semantic_cache import RedisSemanticCache

I/O Contract

Inputs

Parameter	Type	Required	Description
`host`	`Optional[str]`	No	Redis host. Falls back to `REDIS_HOST` env var.
`port`	`Optional[str]`	No	Redis port. Falls back to `REDIS_PORT` env var.
`password`	`Optional[str]`	No	Redis password. Falls back to `REDIS_PASSWORD` env var.
`redis_url`	`Optional[str]`	No	Full Redis URL (alternative to host/port/password).
`similarity_threshold`	`Optional[float]`	Yes	Similarity threshold (0.0-1.0). 1.0 = exact match only, 0.0 = accept any match.
`embedding_model`	`str`	No	Model for generating embeddings. Defaults to `"text-embedding-ada-002"`.
`index_name`	`Optional[str]`	No	Redis index name. Defaults to `"litellm_semantic_cache_index"`.

Key Methods

Method	Returns	Description
`set_cache(key, value, **kwargs)`	`None`	Stores a response with its prompt embedding in the semantic cache.
`get_cache(key, **kwargs)`	`Any`	Retrieves a semantically similar cached response.
`async_set_cache(key, value, **kwargs)`	`None`	Async version of `set_cache` with async embedding generation.
`async_get_cache(key, **kwargs)`	`Any`	Async version of `get_cache` with async embedding generation. Updates `kwargs["metadata"]["semantic-similarity"]`.
`async_set_cache_pipeline(cache_list, **kwargs)`	`None`	Batch async store of multiple cache entries.

Outputs

Output	Type	Description
Cached response	`Any`	The cached LLM response if a semantically similar prompt is found, otherwise `None`.

Usage Examples

from litellm.caching.redis_semantic_cache import RedisSemanticCache

cache = RedisSemanticCache(
    redis_url="redis://:password@localhost:6379",
    similarity_threshold=0.8,
    embedding_model="text-embedding-ada-002",
)

# Store a cache entry
cache.set_cache(
    key="cache-key",
    value={"response": "Hello!"},
    messages=[{"role": "user", "content": "Hi there!"}],
)

# Retrieve by semantic similarity
result = cache.get_cache(
    key="cache-key",
    messages=[{"role": "user", "content": "Hey!"}],
)
# Returns the cached response if "Hey!" is semantically similar to "Hi there!"

# Async usage
result = await cache.async_get_cache(
    key="cache-key",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    metadata={}
)
# metadata["semantic-similarity"] will be updated with the similarity score

Related Pages

BerriAI_Litellm_Dual_Cache - the two-tier in-memory + Redis caching implementation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment