Heuristic:Promptfoo Promptfoo Transient Error Classification
| Knowledge Sources | |
|---|---|
| Domains | Error_Handling, Network |
| Last Updated | 2026-02-14 08:00 GMT |
Overview
Error classification strategy that distinguishes transient connection errors (worth retrying) from permanent configuration errors (will never succeed) by checking `error.code` first, then message patterns.
Description
When an HTTP request fails, promptfoo must decide whether to retry or fail immediately. The `isTransientConnectionError()` function classifies errors into two categories: transient failures that may succeed on retry (stale connections, mid-stream resets) and permanent failures that indicate misconfiguration (wrong certificates, HTTPS-to-HTTP mismatch). This classification is critical because retrying permanent errors wastes time and obscures the root cause.
The key insight is to check `error.code` first (more robust across Node.js versions) rather than parsing error messages, since system errors always set `.code` consistently.
Usage
This heuristic applies to all network operations in promptfoo, including LLM API calls, webhook callbacks, and cache fetches. It is built into the retry policy and should not need manual configuration.
The Insight (Rule of Thumb)
- Action: Check `error.code` property first for classification; fall back to message parsing only for codes without a `.code` property.
- Retryable (transient):
- `ECONNRESET` - Connection reset by peer
- `EPIPE` - Broken pipe (write to closed connection)
- `EPROTO` - Protocol error (unless paired with TLS config phrases)
- `socket hang up` - Connection dropped mid-transfer
- `bad record mac` - TLS record corruption (transient)
- NOT retryable (permanent):
- `self signed certificate` - Server cert not trusted
- `unable to verify` / `unknown ca` - CA chain broken
- `wrong version number` - HTTPS -> HTTP protocol mismatch
- Any `EPROTO` paired with certificate-related messages
- Trade-off: Conservative classification means some borderline errors fail instead of retrying, but this prevents minutes of futile retries on misconfigured endpoints.
Reasoning
From `src/util/fetch/errors.ts:16-47`:
/**
* Detect transient connection errors distinct from rate limits or permanent
* certificate/config errors. Only matches errors that are likely to succeed
* on retry (stale connections, mid-stream resets). Permanent failures like
* "self signed certificate", "unable to verify", "unknown ca", or
* "wrong version number" (HTTPS->HTTP mismatch) are intentionally excluded.
*/
export function isTransientConnectionError(error: Error | undefined): boolean {
// Check error.code first — more robust across Node.js versions than
// parsing error messages, since system errors always set .code.
const code = (error as SystemError).code;
if (code === 'ECONNRESET' || code === 'EPIPE') {
return true;
}
const message = (error.message ?? '').toLowerCase();
// EPROTO can wrap permanent TLS misconfigs. Exclude when paired with
// known permanent error phrases to avoid futile retries.
if (message.includes('eproto') &&
(message.includes('wrong version number') ||
message.includes('self signed') ||
message.includes('unable to verify') ||
message.includes('unknown ca') ||
message.includes('cert'))) {
return false;
}
return (
message.includes('bad record mac') ||
message.includes('eproto') ||
message.includes('econnreset') ||
message.includes('socket hang up')
);
}
The retry policy in `src/scheduler/retryPolicy.ts:65-76` adds higher-level transient errors:
return (
isTransientConnectionError(error) ||
message.includes('timeout') ||
message.includes('econnrefused') ||
message.includes('network') ||
message.includes('503') ||
message.includes('502') ||
message.includes('504')
);