Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Langchain ai Langchain BaseChatModel Generate With Cache

From Leeroopedia
Knowledge Sources
Domains Optimization, Caching
Last Updated 2026-02-11 00:00 GMT

Overview

Concrete tool for caching LLM generation results provided by langchain-core.

Description

The BaseChatModel._generate_with_cache() method wraps the core _generate() call with cache-aside logic. It checks the global llm_cache or the model's cache attribute to determine if caching is enabled. When enabled, it serializes the input messages and parameters into a cache key, performs a lookup, and either returns the cached result or delegates to _generate() and stores the result.

Usage

This is an internal method invoked automatically during the invoke() and generate() flow. Users enable caching by setting cache=True on the model or configuring the global set_llm_cache().

Code Reference

Source Location

  • Repository: langchain
  • File: libs/core/langchain_core/language_models/chat_models.py
  • Lines: L1136-1234

Signature

def _generate_with_cache(
    self,
    messages: list[BaseMessage],
    stop: list[str] | None = None,
    run_manager: CallbackManagerForLLMRun | None = None,
    **kwargs: Any,
) -> ChatResult:

Import

# Internal method — accessed via BaseChatModel
from langchain_core.language_models import BaseChatModel

I/O Contract

Inputs

Name Type Required Description
messages list[BaseMessage] Yes Prepared messages for the model
stop list[str] or None No Stop sequences
run_manager CallbackManagerForLLMRun or None No Callback manager for tracing

Outputs

Name Type Description
return ChatResult Chat result from cache or from _generate()

Usage Examples

Enabling Global Cache

from langchain_core.globals import set_llm_cache
from langchain_core.caches import InMemoryCache
from langchain_openai import ChatOpenAI

# Enable global in-memory cache
set_llm_cache(InMemoryCache())

llm = ChatOpenAI(model="gpt-4o-mini")

# First call hits API
response1 = llm.invoke("What is 2+2?")

# Second identical call returns cached result
response2 = llm.invoke("What is 2+2?")

Per-Model Cache Control

from langchain_openai import ChatOpenAI

# Disable caching for this specific model
llm = ChatOpenAI(model="gpt-4o-mini", cache=False)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment