Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:BerriAI Litellm Redis Semantic Cache

From Leeroopedia
Attribute Value
Sources litellm/caching/redis_semantic_cache.py
Domains Caching, Semantic Search, Redis, Embeddings, Vector Search
Last Updated 2026-02-15 16:00 GMT

Overview

The RedisSemanticCache provides semantic caching of LLM responses using Redis vector similarity search, enabling cache hits for prompts that are semantically similar rather than requiring exact string matches.

Description

RedisSemanticCache extends BaseCache and uses RedisVL's SemanticCache to store and retrieve cached responses based on cosine similarity of prompt embeddings. When a cache entry is stored, the prompt text is embedded using a configurable embedding model and stored alongside the response in a Redis vector index. On retrieval, the incoming prompt is embedded and compared against stored vectors; if the similarity exceeds the configured threshold, the cached response is returned. The class converts between similarity scores (0-1, where 1 is most similar) and cosine distance values (0-2, where 0 is most similar) internally. Both sync and async operations are supported, with async operations routing embedding generation through the LiteLLM proxy router when available. TTL support is included for cache entry expiration.

Usage

Import RedisSemanticCache when you want to cache LLM responses based on semantic similarity of prompts. Requires the redisvl package and a running Redis instance with the RediSearch module.

Code Reference

Source Location

litellm/caching/redis_semantic_cache.py

Signature

class RedisSemanticCache(BaseCache):
    DEFAULT_REDIS_INDEX_NAME: str = "litellm_semantic_cache_index"

    def __init__(
        self,
        host: Optional[str] = None,
        port: Optional[str] = None,
        password: Optional[str] = None,
        redis_url: Optional[str] = None,
        similarity_threshold: Optional[float] = None,
        embedding_model: str = "text-embedding-ada-002",
        index_name: Optional[str] = None,
        **kwargs,
    )

Import

from litellm.caching.redis_semantic_cache import RedisSemanticCache

I/O Contract

Inputs

Parameter Type Required Description
host Optional[str] No Redis host. Falls back to REDIS_HOST env var.
port Optional[str] No Redis port. Falls back to REDIS_PORT env var.
password Optional[str] No Redis password. Falls back to REDIS_PASSWORD env var.
redis_url Optional[str] No Full Redis URL (alternative to host/port/password).
similarity_threshold Optional[float] Yes Similarity threshold (0.0-1.0). 1.0 = exact match only, 0.0 = accept any match.
embedding_model str No Model for generating embeddings. Defaults to "text-embedding-ada-002".
index_name Optional[str] No Redis index name. Defaults to "litellm_semantic_cache_index".

Key Methods

Method Returns Description
set_cache(key, value, **kwargs) None Stores a response with its prompt embedding in the semantic cache.
get_cache(key, **kwargs) Any Retrieves a semantically similar cached response.
async_set_cache(key, value, **kwargs) None Async version of set_cache with async embedding generation.
async_get_cache(key, **kwargs) Any Async version of get_cache with async embedding generation. Updates kwargs["metadata"]["semantic-similarity"].
async_set_cache_pipeline(cache_list, **kwargs) None Batch async store of multiple cache entries.

Outputs

Output Type Description
Cached response Any The cached LLM response if a semantically similar prompt is found, otherwise None.

Usage Examples

from litellm.caching.redis_semantic_cache import RedisSemanticCache

cache = RedisSemanticCache(
    redis_url="redis://:password@localhost:6379",
    similarity_threshold=0.8,
    embedding_model="text-embedding-ada-002",
)

# Store a cache entry
cache.set_cache(
    key="cache-key",
    value={"response": "Hello!"},
    messages=[{"role": "user", "content": "Hi there!"}],
)

# Retrieve by semantic similarity
result = cache.get_cache(
    key="cache-key",
    messages=[{"role": "user", "content": "Hey!"}],
)
# Returns the cached response if "Hey!" is semantically similar to "Hi there!"
# Async usage
result = await cache.async_get_cache(
    key="cache-key",
    messages=[{"role": "user", "content": "Hello, how are you?"}],
    metadata={}
)
# metadata["semantic-similarity"] will be updated with the similarity score

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment