Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Langchain ai Langgraph Checkpoint Backend Selection

From Leeroopedia
Attribute Value
Page Type Principle
Library langgraph (checkpoint)
Workflow Persistence_and_Memory_Setup
Principle Checkpoint_Backend_Selection
Implementation Langchain_ai_Langgraph_BaseCheckpointSaver_Protocol
Source libs/checkpoint/langgraph/checkpoint/base/__init__.py:L1-513

Overview

Checkpoint backend selection is the foundational decision in configuring LangGraph persistence. A checkpointer captures the full state of a graph at each step of execution, enabling resumption, replay, and debugging. The choice of backend determines the durability, scalability, and performance characteristics of state persistence. LangGraph provides a protocol-driven design where all checkpointers implement the same abstract interface (BaseCheckpointSaver), allowing backends to be swapped transparently without changing application logic.

Description

Every LangGraph graph can optionally be compiled with a checkpoint saver. When a checkpoint saver is present and a thread_id is provided in the invocation config, the graph automatically persists its state after each execution step. This enables several critical capabilities:

  • Resumability: A graph execution interrupted by errors or human-in-the-loop patterns can be resumed from exactly where it stopped.
  • Conversational memory: Reusing the same thread_id across invocations accumulates state, enabling multi-turn conversational flows.
  • Time-travel debugging: Past states can be retrieved and replayed to understand or audit execution history.
  • Fork and branch: New execution branches can be created from any historical checkpoint.

The checkpoint protocol centers on a small set of operations: put (store a checkpoint), get_tuple (retrieve a checkpoint with metadata), list (enumerate checkpoints matching criteria), and put_writes (store intermediate writes linked to a checkpoint). Each operation has both synchronous and asynchronous variants.

LangGraph offers several built-in backends:

  • InMemorySaver: Stores checkpoints in Python dictionaries. Suitable for testing and prototyping. Data is lost when the process exits.
  • SqliteSaver: Stores checkpoints in a SQLite database. Suitable for single-process use cases, demos, and small projects. Does not support concurrent multi-thread access.
  • PostgresSaver / AsyncPostgresSaver: Stores checkpoints in PostgreSQL. Recommended for production use cases. Supports concurrent access, horizontal scaling, and durable persistence.

Serialization Protocol

All backends use a pluggable serialization protocol (SerializerProtocol). The default serializer is JsonPlusSerializer, which handles a wide variety of Python types. Custom serializers can be provided to any checkpointer via the serde parameter.

Checkpoint Structure

A checkpoint is a TypedDict containing:

  • v: Format version (currently 1).
  • id: A unique, monotonically increasing identifier (UUID v6-based).
  • ts: ISO 8601 timestamp.
  • channel_values: Mapping from channel names to their deserialized snapshot values.
  • channel_versions: Mapping from channel names to monotonically increasing version strings.
  • versions_seen: Mapping from node IDs to channel version maps, used for determining which nodes to execute next.

Checkpoints are returned as CheckpointTuple named tuples that bundle the checkpoint data with its configuration, metadata (source, step number, parent references), optional parent configuration, and any pending writes.

Usage

When selecting a checkpoint backend, consider:

  1. Durability requirements: For production systems, use PostgresSaver. For development and testing, InMemorySaver is sufficient. For lightweight local persistence, SqliteSaver works well.
  2. Concurrency requirements: PostgresSaver supports multiple concurrent graph instances. SqliteSaver is limited to single-process usage. InMemorySaver supports concurrent async access within a single process.
  3. Async vs sync: If your application is async, prefer AsyncPostgresSaver or InMemorySaver (which supports both). SqliteSaver does not support async; use AsyncSqliteSaver instead.
  4. Custom backends: Subclass BaseCheckpointSaver and implement get_tuple, list, put, and put_writes (plus their async counterparts) to create a custom backend.

Theoretical Basis

The checkpoint pattern in LangGraph implements a form of event sourcing combined with snapshotting. Rather than storing only the latest state, the system maintains a chain of immutable snapshots linked by parent references. Each snapshot captures the full channel state at a specific execution step, along with metadata about what triggered the checkpoint (input, loop iteration, manual update, or fork).

This design draws from distributed systems concepts:

  • Write-ahead logging: Pending writes are stored separately from committed checkpoints, enabling crash recovery.
  • Optimistic concurrency: Channel versions act as vector clocks, enabling the runtime to determine which nodes have observed which state changes.
  • Thread isolation: The thread_id serves as the partition key, ensuring that concurrent conversations or workflows do not interfere with each other.

The protocol's use of monotonically increasing checkpoint IDs (based on UUID v6 with embedded timestamps) ensures that checkpoints are naturally ordered, enabling efficient range queries for history browsing and pagination.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment