Heuristic:Openclaw Openclaw Conservative Error Retry Classification
| Knowledge Sources | |
|---|---|
| Domains | Reliability, Error_Handling |
| Last Updated | 2026-02-06 12:00 GMT |
Overview
Error classification strategy that only retries when explicitly told it is safe (429, 5xx status codes), preferring to drop a message over risking duplicates from ambiguous transport errors.
Description
When sending messages via channel APIs, OpenClaw classifies errors into five categories: `auth` (401/403), `throttled` (429 with Retry-After), `transient` (408, 5xx), `permanent` (other 4xx), and `unknown` (no status code). Only `throttled` and `transient` errors trigger retries. Critically, ambiguous errors (network timeouts, connection resets) where delivery status is unknown are NOT retried, because the message may have already been delivered. This conservative approach prevents duplicate messages, which are more disruptive to users than a single lost message.
Usage
Apply this heuristic when implementing message delivery to any external API. Classify errors by HTTP status code and only retry when the response explicitly indicates the message was not accepted. Never retry on ambiguous transport-level errors where the message may have been delivered.
The Insight (Rule of Thumb)
- Action: Classify errors by HTTP status code into 5 categories. Only retry `throttled` (429) and `transient` (5xx, 408).
- Value: Prefer message loss over message duplication for user-facing channels.
- Trade-off: Some messages may be lost on transient network issues. This is acceptable because duplicates are more disruptive than gaps.
- Hints: Encode troubleshooting advice in error classification (e.g., "check appId/appPassword" for auth errors).
Reasoning
Messaging APIs like Microsoft Teams, Telegram, and Discord process messages asynchronously. When a network error occurs mid-request, the server may have already accepted and delivered the message. Retrying in this case produces a duplicate that users see twice. By only retrying when the server explicitly rejects the message (429, 5xx), OpenClaw avoids this class of bugs. The error hint system encodes tribal knowledge about common failure modes into the error objects, reducing debugging time.
Code Evidence from `extensions/msteams/src/errors.ts:126-175`:
export type MSTeamsSendErrorKind = "auth" | "throttled" | "transient" | "permanent" | "unknown";
// Important: We only mark errors as retryable when we have an explicit HTTP
// status code that indicates the message was not accepted (e.g. 429, 5xx).
// For transport-level errors where delivery is ambiguous, we prefer to avoid
// retries to reduce the chance of duplicate posts.
Error hint generation from `extensions/msteams/src/errors.ts:177-190`:
// Auth: "check msteams appId/appPassword/tenantId (or env vars ...)"
// Throttled: "Teams throttled the bot; backing off may help"
// Transient: "transient Teams/Bot Framework error; retry may succeed"
Robust error formatting from `extensions/msteams/src/errors.ts:1-28`:
export function formatUnknownError(err: unknown): string {
if (err instanceof Error) return err.message;
if (typeof err === "string") return err;
if (err === null) return "null";
// ... handle all types defensively
try {
return JSON.stringify(err) ?? "unknown error";
} catch {
return "unknown error";
}
}