Workflow:Langchain ai Langchain Chat Model Invocation

Knowledge Sources	LangChain LangChain Docs
Domains	LLMs, Chat_Models, AI_Integration
Last Updated	2026-02-11 15:00 GMT

Overview

End-to-end process for invoking a chat model through the LangChain abstraction layer, from user input to parsed AI response.

Description

This workflow describes the standard procedure for sending a prompt to any LangChain-compatible chat model provider (OpenAI, Anthropic, Ollama, etc.) and receiving a structured response. LangChain's BaseChatModel provides a unified interface that converts user input into a normalized prompt, checks for cached responses, optionally routes through streaming or non-streaming code paths, calls the provider-specific API, and returns an AIMessage with content, usage metadata, and response metadata. The design allows swapping providers with zero changes to application code.

Usage

Execute this workflow when you need to send a natural language prompt to a chat model and receive a response. This applies to any LangChain-supported provider, whether using the synchronous invoke() method, the asynchronous ainvoke(), or the batch batch() method. The input can be a plain string, a list of message objects, or a PromptValue from a prompt template.

Execution Steps

Step 1: Initialize the Chat Model

Instantiate a provider-specific chat model class with connection parameters and model settings. Each provider (OpenAI, Anthropic, Ollama, etc.) has its own class that extends BaseChatModel from langchain-core. Configuration includes the model name, API key, temperature, max tokens, timeout, and retry settings. The model reads API keys from environment variables or accepts them directly.

Key considerations:

Each provider package is installed separately (e.g., langchain-openai, langchain-anthropic)
API keys should come from environment variables, not hardcoded
Model profiles provide capability metadata (token limits, supported features)

Step 2: Prepare the Input

Convert the user's input into a normalized PromptValue representation. LangChain accepts three input formats: a raw string (converted to a single HumanMessage), a list of BaseMessage objects (SystemMessage, HumanMessage, AIMessage, ToolMessage), or a PromptValue from a prompt template. The _convert_input() method in BaseChatModel handles this normalization transparently.

Key considerations:

System messages should appear first in the message list
Message history can include prior AI responses for multi-turn conversations
Prompt templates with variables are resolved before conversion

Step 3: Check the LLM Cache

Before calling the API, the framework checks whether an identical prompt has been previously sent and cached. The _generate_with_cache() method looks up the prompt in the configured cache backend. If a cache hit is found, the cached ChatResult is returned immediately without making an API call, saving latency and cost.

Key considerations:

Caching is optional and must be explicitly enabled
Cache keys are derived from the serialized model configuration and prompt
Cache is bypassed when streaming is requested

Step 4: Apply Rate Limiting

If a rate limiter is configured, the framework throttles the request before proceeding to the API call. This prevents exceeding provider-imposed rate limits and avoids HTTP 429 errors. The rate limiter is applied after cache checking so that cache hits do not consume rate limit quota.

Key considerations:

Rate limiting is optional and configurable per model instance
Useful for high-throughput applications with many concurrent requests
Does not apply to cached responses

Step 5: Route to Streaming or Non-Streaming Path

The framework decides whether to use the streaming or non-streaming code path based on the model configuration and invocation method. The _should_stream() method evaluates whether the caller requested streaming or whether the model defaults to streaming mode. For non-streaming invocations, _generate() is called directly. For streaming invocations, _stream() is called and chunks are accumulated into a complete response.

Key considerations:

invoke() typically uses the non-streaming path unless the model is configured to always stream
stream() always uses the streaming path and yields chunks incrementally
The accumulated streaming result is equivalent to a non-streaming result

Step 6: Call the Provider API

Execute the provider-specific API call. Each provider implements _generate() (sync) or _agenerate() (async) which builds the request payload, converts LangChain messages to the provider's format, calls the underlying SDK client, and returns a ChatResult. For OpenAI, this calls the Chat Completions or Responses API. For Anthropic, this calls the Messages API. The request includes the converted messages, model parameters, and any tool definitions.

Key considerations:

Message format conversion is provider-specific (e.g., OpenAI uses role/content dicts, Anthropic uses content blocks)
HTTP clients are cached and reused for connection pooling
Retry logic handles transient failures with exponential backoff

Step 7: Parse and Return the Response

Convert the provider's raw API response into a standardized ChatResult containing one or more ChatGeneration objects. Each generation wraps an AIMessage with the response content, any tool calls, usage metadata (input/output/total tokens), and provider-specific response metadata (model version, stop reason, system fingerprint). The invoke() method extracts the first generation's AIMessage and returns it to the caller.

Key considerations:

Usage metadata is normalized across providers (input_tokens, output_tokens, total_tokens)
Tool calls are extracted into structured ToolCall objects on the AIMessage
Response metadata preserves provider-specific details for debugging

Execution Diagram

GitHub URL

Workflow Repository