Principle:OpenHands OpenHands Distributed State Management

Knowledge Sources	OpenHands
Domains	Distributed_Systems, Conversation_Management
Last Updated	2026-02-11 21:00 GMT

Overview

Distributed state management is the coordination of conversation state across multiple server instances in a cluster using a publish-subscribe messaging pattern backed by a shared data store.

Description

In a horizontally-scaled deployment, multiple server nodes may need to be aware of the same conversation's state. For example, when a user's WebSocket connection is routed to Node A but the agent loop for that conversation is running on Node B, Node A must receive state updates from Node B to relay them to the user's browser. Without a coordination mechanism, each node would only know about conversations it directly owns, creating an inconsistent view of the system.

Distributed state management solves this through:

Pub/sub messaging -- Nodes publish state change events (conversation started, status changed, conversation stopped) to a shared message bus. All nodes subscribe to these events and update their local state accordingly.
Shared state store -- A centralized data store (typically Redis) holds the canonical state of all conversations. Nodes read from this store to reconcile their local state on startup or after missed messages.
Local state cache -- Each node maintains a local in-memory cache of conversation states for fast lookups, updated by incoming pub/sub messages and periodic reconciliation with the shared store.

The combination of pub/sub for real-time updates and a shared store for durability ensures that all nodes converge on the same state view, even after transient network partitions or node restarts.

Usage

Use distributed state management whenever:

The system runs on more than one server node and conversations may be accessed from any node.
WebSocket connections and agent loops may reside on different nodes.
The system must survive node failures and redistribute conversation ownership.

Theoretical Basis

The pattern combines publish-subscribe messaging with eventual consistency from distributed systems theory.

Pseudocode:

# Subscriber side (runs on every node)
function redis_subscribe():
    channel = "conversation_state_updates"
    subscription = redis.subscribe(channel)

    async for message in subscription:
        process_message(message)


function process_message(message):
    event_type = message["type"]
    conversation_id = message["conversation_id"]
    source_node = message["source_node"]

    if source_node == self_node_id:
        # Ignore messages from ourselves
        return

    if event_type == "status_changed":
        update_local_state(conversation_id, message["new_status"])
        notify_connected_clients(conversation_id, message["new_status"])

    elif event_type == "conversation_stopped":
        remove_from_local_state(conversation_id)
        disconnect_clients(conversation_id)

    elif event_type == "conversation_started":
        add_to_local_state(conversation_id, message["initial_status"])


# Publisher side (called when local state changes)
function update_state_in_redis(conversation_id, new_status):
    # Update the shared store
    redis.hset("conversation_states", conversation_id, serialize(new_status))

    # Publish notification
    redis.publish("conversation_state_updates", {
        "type": "status_changed",
        "conversation_id": conversation_id,
        "new_status": new_status,
        "source_node": self_node_id,
    })

Key invariants:

Self-message filtering -- Nodes must ignore pub/sub messages they published themselves to avoid feedback loops.
Idempotent processing -- Message handlers must be idempotent because pub/sub does not guarantee exactly-once delivery.
Reconciliation -- Periodic reconciliation with the shared store catches any messages lost during transient disconnections.

Related Pages

Implemented By

Implementation:OpenHands_OpenHands_ClusteredConversationManager_Redis_Subscribe

Heuristics

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment