Heuristic:OpenHands OpenHands Redis Distributed Locking
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Systems, Conversation_Management |
| Last Updated | 2026-02-11 21:00 GMT |
Overview
Use Redis `SET key value NX EX ttl` for distributed mutual exclusion without explicit locks, ensuring only one server instance manages a given conversation.
Description
In the clustered conversation manager, OpenHands uses Redis atomic set-if-not-exists operations as lightweight distributed locks. Instead of using a formal distributed lock library (like Redlock), the system uses `redis.set(key, 1, nx=True, ex=timeout)` where `nx=True` means "set only if key does not exist" and `ex=timeout` sets an automatic expiry. This provides exactly-once semantics for conversation assignment: the first server to claim a conversation wins, and all others see that the key already exists. The expiry ensures that locks are automatically released if a server crashes without cleanup.
Usage
Apply this pattern when you need distributed mutual exclusion across multiple OpenHands server instances. Key use cases include:
- Claiming ownership of a conversation (only one server should run the agent loop)
- Preventing duplicate webhook processing (idempotency)
- Rate limiting across multiple server instances
The Insight (Rule of Thumb)
- Action: Use `redis.set(key, value, nx=True, ex=ttl)` instead of explicit distributed locks for simple mutual exclusion.
- Value: TTL of 15 seconds for conversation ownership (refreshed every 5 seconds); TTL of 60 seconds for webhook deduplication.
- Trade-off: Simpler than Redlock but does not handle clock drift or split-brain scenarios. Acceptable for this use case because conversation reassignment on edge failures is handled by recovery logic.
- Key Pattern: Use structured Redis keys like `ohcnv:{user_id}:{conversation_id}` for conversation ownership and `ohcnct:{user_id}:{conversation_id}:{connection_id}` for connection tracking.
Reasoning
The `NX+EX` pattern is preferred over formal distributed lock implementations because:
- Simplicity: A single Redis command replaces complex lock acquisition/release logic.
- Automatic cleanup: The `EX` expiry ensures locks are released even if the owning server crashes, preventing deadlocks.
- No coordination overhead: Unlike Redlock (which requires multiple Redis instances), this works with a single Redis instance.
- Sufficient for this use case: Conversation management can tolerate brief periods where a conversation is unowned (between expiry and re-claim), because the recovery logic detects and handles this.
- Refresh pattern: The owner refreshes the key every 5 seconds against a 15-second TTL, providing a 10-second safety margin before the lock expires.
Code evidence from `enterprise/server/clustered_conversation_manager.py:397-400`:
# If we can set the key in redis then no other worker is running this conversation
redis = self._get_redis_client()
key = self._get_redis_conversation_key(user_id, sid)
created = await redis.set(key, 1, nx=True, ex=_REDIS_ENTRY_TIMEOUT_SECONDS)
Redis key constants from `enterprise/server/clustered_conversation_manager.py:40-49`:
# Time in seconds between cleanup operations for stale conversations
_CLEANUP_INTERVAL_SECONDS = 15
# Time in seconds before a Redis entry is considered expired if not refreshed
_REDIS_ENTRY_TIMEOUT_SECONDS = 15
# Time in seconds between updates to Redis entries
_REDIS_UPDATE_INTERVAL_SECONDS = 5
_REDIS_POLL_TIMEOUT = 0.15
Webhook deduplication from `enterprise/server/routes/integration/gitlab.py:95-104`:
dedup_key = object_attributes.get('id')
if not dedup_key:
dedup_json = json.dumps(payload_data, sort_keys=True)
dedup_hash = hashlib.sha256(dedup_json.encode()).hexdigest()
dedup_key = f'gitlab_msg: {dedup_hash}'
created = await redis.set(dedup_key, 1, nx=True, ex=60)