Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Langchain ai Langchain Anthropic Prompt Caching

From Leeroopedia
Knowledge Sources
Domains Middleware, Anthropic, Prompt Caching
Last Updated 2026-02-11 00:00 GMT

Overview

AnthropicPromptCachingMiddleware is agent middleware that optimizes API usage by adding cache control blocks to conversation prefixes for Anthropic models.

Description

AnthropicPromptCachingMiddleware extends AgentMiddleware from langchain.agents.middleware.types to inject prompt caching configuration into model requests destined for Anthropic's Claude models. It modifies the model_settings on outgoing requests to include cache_control metadata with configurable cache type and TTL. The middleware validates that the target model is a ChatAnthropic instance and supports configurable behavior (ignore, warn, or raise) when used with unsupported models. It also supports a minimum message threshold before activating caching.

Usage

Import this middleware when building LangChain agents using Anthropic models to reduce API costs and latency through prompt caching, especially for long conversations or repeated system prompts.

Code Reference

Source Location

  • Repository: Langchain_ai_Langchain
  • File: libs/partners/anthropic/langchain_anthropic/middleware/prompt_caching.py
  • Lines: 1-148

Signature

class AnthropicPromptCachingMiddleware(AgentMiddleware):
    def __init__(
        self,
        type: Literal["ephemeral"] = "ephemeral",
        ttl: Literal["5m", "1h"] = "5m",
        min_messages_to_cache: int = 0,
        unsupported_model_behavior: Literal["ignore", "warn", "raise"] = "warn",
    ) -> None: ...

    def _should_apply_caching(self, request: ModelRequest) -> bool: ...

    def wrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], ModelResponse],
    ) -> ModelCallResult: ...

    async def awrap_model_call(
        self,
        request: ModelRequest,
        handler: Callable[[ModelRequest], Awaitable[ModelResponse]],
    ) -> ModelCallResult: ...

Import

from langchain_anthropic.middleware.prompt_caching import AnthropicPromptCachingMiddleware

I/O Contract

Inputs

Name Type Required Description
type Literal["ephemeral"] No The type of cache to use. Only "ephemeral" is supported.
ttl Literal["5m", "1h"] No Time to live for the cache. Supports "5m" (5 minutes) and "1h" (1 hour).
min_messages_to_cache int No Minimum number of messages before the cache is activated.
unsupported_model_behavior Literal["ignore", "warn", "raise"] No Behavior when a non-Anthropic model is encountered.

Outputs

Name Type Description
wrap_model_call return ModelCallResult The model response, with cache control added to the request if applicable.
awrap_model_call return ModelCallResult Async variant returning the same.

Behavior Details

The middleware adds the following to model_settings when caching is applied:

{"cache_control": {"type": "ephemeral", "ttl": "5m"}}

Unsupported model behavior:

  • "ignore" -- Silently skips caching for non-Anthropic models.
  • "warn" -- Issues a warning and skips caching (default).
  • "raise" -- Raises a ValueError and stops the agent.

Usage Examples

Basic Usage

from langchain_anthropic.middleware.prompt_caching import (
    AnthropicPromptCachingMiddleware,
)

# Default: ephemeral cache with 5-minute TTL
middleware = AnthropicPromptCachingMiddleware()

# Custom configuration: 1-hour TTL, activate after 5 messages
middleware = AnthropicPromptCachingMiddleware(
    ttl="1h",
    min_messages_to_cache=5,
    unsupported_model_behavior="raise",
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment