Principle:Langchain ai Langgraph State Schema Definition
| Metadata | Value |
|---|---|
| Type | Principle |
| Library | langgraph |
| Source | libs/langgraph/langgraph/graph/message.py, libs/langgraph/langgraph/graph/state.py
|
| Workflow | Building_a_Stateful_Graph |
Overview
State schema definition is the foundational step when building a stateful graph in LangGraph. A state schema declares the shape of the data that flows through every node in the graph, including which fields exist, their types, and how concurrent updates to the same field are merged. LangGraph uses Python's TypedDict or Pydantic BaseModel as the schema class, with optional Annotated metadata to attach reducer functions that control state-merging behavior.
Description
In LangGraph, each graph operates on a single shared state object. Nodes read from this state and return partial updates. The framework must know:
- What fields exist -- the keys and their types.
- How updates are merged -- whether a field should be overwritten (last-write-wins) or accumulated (append, merge-by-id, custom logic).
A state schema answers both questions in one declaration. At its simplest, a TypedDict with plain type annotations uses last-value semantics: each field is replaced entirely by whatever the most recent node returns. For more sophisticated behavior, fields are wrapped with typing.Annotated and a reducer function.
Reducer Functions
A reducer is any callable with the signature (current_value, new_value) -> merged_value. When a node returns a partial state update containing a field that has a reducer, LangGraph calls the reducer instead of blindly overwriting. Common patterns include:
- Append-only lists --
operator.addconcatenates new items onto the existing list. - Message merging --
add_messagesappends new messages but can also update or remove messages by ID. - Custom aggregation -- any user-defined two-argument function.
Internally, LangGraph inspects the __metadata__ attribute that Annotated attaches. If the last metadata item is a callable with two positional parameters, LangGraph wraps the field in a BinaryOperatorAggregate channel. If the metadata is a BaseChannel subclass, that channel type is used directly. Fields without annotation metadata default to LastValue channels.
State Channels
Every field in a state schema maps to a channel. Channels are the internal transport mechanism in the Pregel execution engine:
- LastValue -- stores one value; each update overwrites the previous.
- BinaryOperatorAggregate -- stores a persistent value updated by applying a binary operator (the reducer) to the current value and each incoming update.
- EphemeralValue -- stores a value that is reset between supersteps.
- Topic -- a pub/sub channel for multiple values.
The schema-to-channel mapping is performed at graph construction time inside StateGraph.__init__, which calls _get_channels(schema). This function iterates the TypedDict type hints and maps each annotated field to its corresponding channel type.
Prebuilt Schemas
LangGraph ships with MessagesState, a single-field TypedDict that provides the most common pattern for chatbot-style applications:
class MessagesState(TypedDict):
messages: Annotated[list[AnyMessage], add_messages]
This schema declares a messages field whose reducer is add_messages, which merges messages by ID and supports append, update, and removal semantics.
Usage
from typing import Annotated
from typing_extensions import TypedDict
from langgraph.graph import StateGraph
from langgraph.graph.message import add_messages
import operator
# Simple schema with last-write-wins semantics
class SimpleState(TypedDict):
count: int
name: str
# Schema with a reducer that appends to a list
class AccumulatingState(TypedDict):
items: Annotated[list, operator.add]
# Schema with the add_messages reducer for chat applications
class ChatState(TypedDict):
messages: Annotated[list, add_messages]
user_name: str
# Or use the prebuilt MessagesState directly
from langgraph.graph import MessagesState
graph = StateGraph(MessagesState)
Theoretical Basis
State schema definition draws from several foundational ideas:
- Conflict-free Replicated Data Types (CRDTs) -- Reducer functions resemble CRDT merge functions: they are deterministic binary operations that resolve concurrent updates without coordination. This ensures that regardless of node execution order within a superstep, the final state is well-defined.
- Channel-based communication -- The mapping from schema fields to channels follows the Communicating Sequential Processes (CSP) model, where independent processes share data through typed channels rather than shared memory.
- Type-driven design -- By embedding merge semantics directly in the type annotation, LangGraph keeps schema and behavior co-located, reducing the surface area for configuration errors.