Implementation:Langchain ai Langchain Anthropic Prompt Caching

Knowledge Sources	Langchain_ai_Langchain
Domains	Middleware, Anthropic, Prompt Caching
Last Updated	2026-02-11 00:00 GMT

Overview

AnthropicPromptCachingMiddleware is agent middleware that optimizes API usage by adding cache control blocks to conversation prefixes for Anthropic models.

Description

AnthropicPromptCachingMiddleware extends AgentMiddleware from langchain.agents.middleware.types to inject prompt caching configuration into model requests destined for Anthropic's Claude models. It modifies the model_settings on outgoing requests to include cache_control metadata with configurable cache type and TTL. The middleware validates that the target model is a ChatAnthropic instance and supports configurable behavior (ignore, warn, or raise) when used with unsupported models. It also supports a minimum message threshold before activating caching.

Usage

Import this middleware when building LangChain agents using Anthropic models to reduce API costs and latency through prompt caching, especially for long conversations or repeated system prompts.

Code Reference

Source Location

Repository: Langchain_ai_Langchain
File: libs/partners/anthropic/langchain_anthropic/middleware/prompt_caching.py
Lines: 1-148

Signature

class AnthropicPromptCachingMiddleware(AgentMiddleware):
    def __init__(
        self,
        type: Literal["ephemeral"] = "ephemeral",
        ttl: Literal["5m", "1h"] = "5m",
        min_messages_to_cache: int = 0,
        unsupported_model_behavior: Literal["ignore", "warn", "raise"] = "warn",
    ) -> None: ...

    def _should_apply_caching(self, request: ModelRequest) -> bool: ...

    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelCallResult: ...

    async def awrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], Awaitable[ModelResponse]],
    ) -> ModelCallResult: ...

Import

from langchain_anthropic.middleware.prompt_caching import AnthropicPromptCachingMiddleware

I/O Contract

Inputs

Name	Type	Required	Description
type	`Literal["ephemeral"]`	No	The type of cache to use. Only `"ephemeral"` is supported.
ttl	`Literal["5m", "1h"]`	No	Time to live for the cache. Supports `"5m"` (5 minutes) and `"1h"` (1 hour).
min_messages_to_cache	`int`	No	Minimum number of messages before the cache is activated.
unsupported_model_behavior	`Literal["ignore", "warn", "raise"]`	No	Behavior when a non-Anthropic model is encountered.

Outputs

Name	Type	Description
wrap_model_call return	`ModelCallResult`	The model response, with cache control added to the request if applicable.
awrap_model_call return	`ModelCallResult`	Async variant returning the same.

Behavior Details

The middleware adds the following to model_settings when caching is applied:

{"cache_control": {"type": "ephemeral", "ttl": "5m"}}

Unsupported model behavior:

"ignore" -- Silently skips caching for non-Anthropic models.
"warn" -- Issues a warning and skips caching (default).
"raise" -- Raises a ValueError and stops the agent.

Usage Examples

Basic Usage

from langchain_anthropic.middleware.prompt_caching import (
    AnthropicPromptCachingMiddleware,
)

# Default: ephemeral cache with 5-minute TTL
middleware = AnthropicPromptCachingMiddleware()

# Custom configuration: 1-hour TTL, activate after 5 messages
middleware = AnthropicPromptCachingMiddleware(
    ttl="1h",
    min_messages_to_cache=5,
    unsupported_model_behavior="raise",
)

Related Pages

Environment:Langchain_ai_Langchain_Anthropic_API_Credentials
Requires both langchain and langchain-anthropic packages
See Anthropic Prompt Caching Documentation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment