Principle:Langchain ai Langgraph Checkpoint Backend Selection
| Attribute | Value |
|---|---|
| Page Type | Principle |
| Library | langgraph (checkpoint) |
| Workflow | Persistence_and_Memory_Setup |
| Principle | Checkpoint_Backend_Selection |
| Implementation | Langchain_ai_Langgraph_BaseCheckpointSaver_Protocol |
| Source | libs/checkpoint/langgraph/checkpoint/base/__init__.py:L1-513
|
Overview
Checkpoint backend selection is the foundational decision in configuring LangGraph persistence. A checkpointer captures the full state of a graph at each step of execution, enabling resumption, replay, and debugging. The choice of backend determines the durability, scalability, and performance characteristics of state persistence. LangGraph provides a protocol-driven design where all checkpointers implement the same abstract interface (BaseCheckpointSaver), allowing backends to be swapped transparently without changing application logic.
Description
Every LangGraph graph can optionally be compiled with a checkpoint saver. When a checkpoint saver is present and a thread_id is provided in the invocation config, the graph automatically persists its state after each execution step. This enables several critical capabilities:
- Resumability: A graph execution interrupted by errors or human-in-the-loop patterns can be resumed from exactly where it stopped.
- Conversational memory: Reusing the same
thread_idacross invocations accumulates state, enabling multi-turn conversational flows. - Time-travel debugging: Past states can be retrieved and replayed to understand or audit execution history.
- Fork and branch: New execution branches can be created from any historical checkpoint.
The checkpoint protocol centers on a small set of operations: put (store a checkpoint), get_tuple (retrieve a checkpoint with metadata), list (enumerate checkpoints matching criteria), and put_writes (store intermediate writes linked to a checkpoint). Each operation has both synchronous and asynchronous variants.
LangGraph offers several built-in backends:
- InMemorySaver: Stores checkpoints in Python dictionaries. Suitable for testing and prototyping. Data is lost when the process exits.
- SqliteSaver: Stores checkpoints in a SQLite database. Suitable for single-process use cases, demos, and small projects. Does not support concurrent multi-thread access.
- PostgresSaver / AsyncPostgresSaver: Stores checkpoints in PostgreSQL. Recommended for production use cases. Supports concurrent access, horizontal scaling, and durable persistence.
Serialization Protocol
All backends use a pluggable serialization protocol (SerializerProtocol). The default serializer is JsonPlusSerializer, which handles a wide variety of Python types. Custom serializers can be provided to any checkpointer via the serde parameter.
Checkpoint Structure
A checkpoint is a TypedDict containing:
- v: Format version (currently
1). - id: A unique, monotonically increasing identifier (UUID v6-based).
- ts: ISO 8601 timestamp.
- channel_values: Mapping from channel names to their deserialized snapshot values.
- channel_versions: Mapping from channel names to monotonically increasing version strings.
- versions_seen: Mapping from node IDs to channel version maps, used for determining which nodes to execute next.
Checkpoints are returned as CheckpointTuple named tuples that bundle the checkpoint data with its configuration, metadata (source, step number, parent references), optional parent configuration, and any pending writes.
Usage
When selecting a checkpoint backend, consider:
- Durability requirements: For production systems, use
PostgresSaver. For development and testing,InMemorySaveris sufficient. For lightweight local persistence,SqliteSaverworks well. - Concurrency requirements:
PostgresSaversupports multiple concurrent graph instances.SqliteSaveris limited to single-process usage.InMemorySaversupports concurrent async access within a single process. - Async vs sync: If your application is async, prefer
AsyncPostgresSaverorInMemorySaver(which supports both).SqliteSaverdoes not support async; useAsyncSqliteSaverinstead. - Custom backends: Subclass
BaseCheckpointSaverand implementget_tuple,list,put, andput_writes(plus their async counterparts) to create a custom backend.
Theoretical Basis
The checkpoint pattern in LangGraph implements a form of event sourcing combined with snapshotting. Rather than storing only the latest state, the system maintains a chain of immutable snapshots linked by parent references. Each snapshot captures the full channel state at a specific execution step, along with metadata about what triggered the checkpoint (input, loop iteration, manual update, or fork).
This design draws from distributed systems concepts:
- Write-ahead logging: Pending writes are stored separately from committed checkpoints, enabling crash recovery.
- Optimistic concurrency: Channel versions act as vector clocks, enabling the runtime to determine which nodes have observed which state changes.
- Thread isolation: The
thread_idserves as the partition key, ensuring that concurrent conversations or workflows do not interfere with each other.
The protocol's use of monotonically increasing checkpoint IDs (based on UUID v6 with embedded timestamps) ensures that checkpoints are naturally ordered, enabling efficient range queries for history browsing and pagination.