Implementation:Langchain ai Langchain Anthropic Prompt Caching
| Knowledge Sources | |
|---|---|
| Domains | Middleware, Anthropic, Prompt Caching |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
AnthropicPromptCachingMiddleware is agent middleware that optimizes API usage by adding cache control blocks to conversation prefixes for Anthropic models.
Description
AnthropicPromptCachingMiddleware extends AgentMiddleware from langchain.agents.middleware.types to inject prompt caching configuration into model requests destined for Anthropic's Claude models. It modifies the model_settings on outgoing requests to include cache_control metadata with configurable cache type and TTL. The middleware validates that the target model is a ChatAnthropic instance and supports configurable behavior (ignore, warn, or raise) when used with unsupported models. It also supports a minimum message threshold before activating caching.
Usage
Import this middleware when building LangChain agents using Anthropic models to reduce API costs and latency through prompt caching, especially for long conversations or repeated system prompts.
Code Reference
Source Location
- Repository: Langchain_ai_Langchain
- File:
libs/partners/anthropic/langchain_anthropic/middleware/prompt_caching.py - Lines: 1-148
Signature
class AnthropicPromptCachingMiddleware(AgentMiddleware):
def __init__(
self,
type: Literal["ephemeral"] = "ephemeral",
ttl: Literal["5m", "1h"] = "5m",
min_messages_to_cache: int = 0,
unsupported_model_behavior: Literal["ignore", "warn", "raise"] = "warn",
) -> None: ...
def _should_apply_caching(self, request: ModelRequest) -> bool: ...
def wrap_model_call(
self,
request: ModelRequest,
handler: Callable[[ModelRequest], ModelResponse],
) -> ModelCallResult: ...
async def awrap_model_call(
self,
request: ModelRequest,
handler: Callable[[ModelRequest], Awaitable[ModelResponse]],
) -> ModelCallResult: ...
Import
from langchain_anthropic.middleware.prompt_caching import AnthropicPromptCachingMiddleware
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| type | Literal["ephemeral"] |
No | The type of cache to use. Only "ephemeral" is supported.
|
| ttl | Literal["5m", "1h"] |
No | Time to live for the cache. Supports "5m" (5 minutes) and "1h" (1 hour).
|
| min_messages_to_cache | int |
No | Minimum number of messages before the cache is activated. |
| unsupported_model_behavior | Literal["ignore", "warn", "raise"] |
No | Behavior when a non-Anthropic model is encountered. |
Outputs
| Name | Type | Description |
|---|---|---|
| wrap_model_call return | ModelCallResult |
The model response, with cache control added to the request if applicable. |
| awrap_model_call return | ModelCallResult |
Async variant returning the same. |
Behavior Details
The middleware adds the following to model_settings when caching is applied:
{"cache_control": {"type": "ephemeral", "ttl": "5m"}}
Unsupported model behavior:
"ignore"-- Silently skips caching for non-Anthropic models."warn"-- Issues a warning and skips caching (default)."raise"-- Raises aValueErrorand stops the agent.
Usage Examples
Basic Usage
from langchain_anthropic.middleware.prompt_caching import (
AnthropicPromptCachingMiddleware,
)
# Default: ephemeral cache with 5-minute TTL
middleware = AnthropicPromptCachingMiddleware()
# Custom configuration: 1-hour TTL, activate after 5 messages
middleware = AnthropicPromptCachingMiddleware(
ttl="1h",
min_messages_to_cache=5,
unsupported_model_behavior="raise",
)
Related Pages
- Environment:Langchain_ai_Langchain_Anthropic_API_Credentials
- Requires both
langchainandlangchain-anthropicpackages - See Anthropic Prompt Caching Documentation