Implementation:Langchain ai Langchain BaseChatModel Generate With Cache
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Caching |
| Last Updated | 2026-02-11 00:00 GMT |
Overview
Concrete tool for caching LLM generation results provided by langchain-core.
Description
The BaseChatModel._generate_with_cache() method wraps the core _generate() call with cache-aside logic. It checks the global llm_cache or the model's cache attribute to determine if caching is enabled. When enabled, it serializes the input messages and parameters into a cache key, performs a lookup, and either returns the cached result or delegates to _generate() and stores the result.
Usage
This is an internal method invoked automatically during the invoke() and generate() flow. Users enable caching by setting cache=True on the model or configuring the global set_llm_cache().
Code Reference
Source Location
- Repository: langchain
- File: libs/core/langchain_core/language_models/chat_models.py
- Lines: L1136-1234
Signature
def _generate_with_cache(
self,
messages: list[BaseMessage],
stop: list[str] | None = None,
run_manager: CallbackManagerForLLMRun | None = None,
**kwargs: Any,
) -> ChatResult:
Import
# Internal method — accessed via BaseChatModel
from langchain_core.language_models import BaseChatModel
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| messages | list[BaseMessage] | Yes | Prepared messages for the model |
| stop | list[str] or None | No | Stop sequences |
| run_manager | CallbackManagerForLLMRun or None | No | Callback manager for tracing |
Outputs
| Name | Type | Description |
|---|---|---|
| return | ChatResult | Chat result from cache or from _generate() |
Usage Examples
Enabling Global Cache
from langchain_core.globals import set_llm_cache
from langchain_core.caches import InMemoryCache
from langchain_openai import ChatOpenAI
# Enable global in-memory cache
set_llm_cache(InMemoryCache())
llm = ChatOpenAI(model="gpt-4o-mini")
# First call hits API
response1 = llm.invoke("What is 2+2?")
# Second identical call returns cached result
response2 = llm.invoke("What is 2+2?")
Per-Model Cache Control
from langchain_openai import ChatOpenAI
# Disable caching for this specific model
llm = ChatOpenAI(model="gpt-4o-mini", cache=False)