Principle:OpenHands OpenHands Session Closure
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Conversation_Management |
| Last Updated | 2026-02-11 21:00 GMT |
Overview
Session closure is the process of gracefully terminating an agent conversation session with coordinated resource cleanup that propagates to all nodes in a distributed cluster.
Description
When a conversation ends -- whether by user request, agent completion, timeout, or error -- the system must perform a series of cleanup operations to release resources and maintain consistency. In a distributed deployment, this cleanup must happen not only on the node that owns the conversation but also on every node that holds a reference to it (e.g., nodes with connected WebSocket clients).
Session closure addresses several concerns:
- Local resource cleanup -- The owning node must stop the agent loop, destroy the runtime container, release the distributed lock, and flush the event stream to persistent storage.
- Cluster-wide notification -- Other nodes must be informed that the conversation has ended so they can remove it from their local state caches and disconnect any WebSocket clients still subscribed to it.
- Graceful degradation -- If the owning node crashes, other nodes must detect the orphaned conversation (via lock expiration or heartbeat failure) and perform cleanup on behalf of the failed node.
- Disconnected session handling -- Sessions that have been disconnected from their WebSocket clients but whose agent loops are still running must be detected and cleaned up after a grace period.
Usage
Use session closure:
- When a user explicitly closes or navigates away from a conversation.
- When the agent loop completes (successfully or with an error).
- When a conversation times out due to inactivity.
- During server shutdown to cleanly terminate all owned conversations.
- During periodic housekeeping to clean up orphaned or disconnected sessions.
Theoretical Basis
Session closure follows the graceful shutdown with distributed notification pattern, combining local resource teardown with cluster-wide propagation.
Pseudocode:
# On the owning node
function close_session(conversation_id):
# Step 1: Stop the agent loop
stop_agent_loop(conversation_id)
# Step 2: Publish closure event to cluster
redis.publish("conversation_state_updates", {
"type": "conversation_stopped",
"conversation_id": conversation_id,
"source_node": self_node_id,
})
# Step 3: Clean up local resources
destroy_runtime(conversation_id)
release_lock(conversation_id)
flush_event_stream(conversation_id)
# Step 4: Remove from local state
remove_from_local_state(conversation_id)
# On every other node (triggered by pub/sub message)
function handle_conversation_stopped(conversation_id):
remove_from_local_state(conversation_id)
disconnect_clients(conversation_id)
# Periodic housekeeping (runs on every node)
function cleanup_orphaned_sessions():
for conversation_id in local_state:
if is_disconnected(conversation_id) and grace_period_expired(conversation_id):
close_session(conversation_id)
Key invariants:
- At-most-once cleanup -- Resource destruction operations (container teardown, lock release) must be idempotent to handle the case where cleanup is attempted by multiple nodes.
- Notification before cleanup -- The closure event must be published before resources are destroyed, so other nodes begin their cleanup while the owning node is still performing its teardown.
- Grace periods -- Disconnected sessions are not immediately closed; a grace period allows for reconnection (e.g., after a temporary network interruption).
- Ordered teardown -- The agent loop must be stopped before the runtime is destroyed, and the event stream must be flushed before the lock is released, to prevent data loss.