Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:CrewAIInc CrewAI MCP Timeout And Retry Strategy

From Leeroopedia
Knowledge Sources
Domains MCP_Integration, Resilience
Last Updated 2026-02-11 17:00 GMT

Overview

Timeout and retry configuration for MCP (Model Context Protocol) server connections: 30-second timeouts, 3 max retries with exponential backoff, and error classification to avoid retrying terminal failures.

Description

CrewAI's MCP client uses a carefully tuned set of timeout and retry constants. Connection, execution, and discovery timeouts are all set to 30 seconds (increased from defaults for slow servers). The retry logic uses exponential backoff (1s, 2s, 4s) with a maximum of 3 attempts. Critically, the retry logic classifies errors as retryable or terminal: authentication and "not found" errors are raised immediately without wasting retries.

Usage

Apply this heuristic when configuring or debugging MCP server connections in CrewAI agents. If MCP tools are timing out, consider whether the server is genuinely slow (increase timeout) or failing permanently (check error classification). The 5-minute cache TTL for MCP tool schemas also reduces redundant discovery calls.

The Insight (Rule of Thumb)

  • Action: Use 30-second timeouts for MCP connections, executions, and discovery
  • Value: `MCP_CONNECTION_TIMEOUT = 30`, `MCP_TOOL_EXECUTION_TIMEOUT = 30`, `MCP_DISCOVERY_TIMEOUT = 30`, `MCP_MAX_RETRIES = 3`
  • Trade-off: Longer timeouts mean slower failure detection; shorter timeouts risk premature abandonment of slow-but-valid requests
  • Cache TTL: MCP tool schemas are cached for 5 minutes (`_cache_ttl = 300`) to avoid redundant discovery
  • Error Classification:
    • Authentication errors ("unauthorized") and "not found" errors are terminal (raised immediately)
    • All other errors are retryable with exponential backoff (2^attempt seconds)

Reasoning

MCP servers can be external processes (stdio) or remote HTTP/SSE endpoints, both of which introduce variable latency. The 30-second timeout was explicitly marked as "increased for slow servers" in the source code, indicating the team experienced real-world timeouts with default values. The exponential backoff (1s, 2s, 4s) prevents thundering herd effects while the 3-retry limit caps total wait time at approximately 37 seconds (30s timeout + 1s + 2s + 4s). Terminal error classification prevents wasting retries on authentication failures, which will never succeed on retry.

The 5-minute schema cache prevents re-discovering tool schemas on every tool call, which is especially important for MCP servers that are slow to enumerate their tools.

Code Evidence

Timeout constants from `lib/crewai/src/crewai/mcp/client.py:37-45`:

# MCP Connection timeout constants (in seconds)
MCP_CONNECTION_TIMEOUT = 30  # Increased for slow servers
MCP_TOOL_EXECUTION_TIMEOUT = 30
MCP_DISCOVERY_TIMEOUT = 30  # Increased for slow servers
MCP_MAX_RETRIES = 3

# Simple in-memory cache for MCP tool schemas (duration: 5 minutes)
_mcp_schema_cache: dict[str, tuple[dict[str, Any], float]] = {}
_cache_ttl = 300  # 5 minutes

Exponential backoff and error classification from `lib/crewai/src/crewai/mcp/client.py:663-707`:

for attempt in range(self.max_retries):
    try:
        if timeout:
            return await asyncio.wait_for(operation(), timeout=timeout)
        return await operation()
    except asyncio.TimeoutError:
        last_error = f"Operation timed out after {timeout} seconds"
        if attempt < self.max_retries - 1:
            wait_time = 2**attempt  # Exponential backoff: 1s, 2s, 4s
            await asyncio.sleep(wait_time)
    except Exception as e:
        if "authentication" in error_str or "unauthorized" in error_str:
            raise ConnectionError(f"Authentication failed: {e}") from e
        if "not found" in error_str:
            raise ValueError(f"Resource not found: {e}") from e
        if attempt < self.max_retries - 1:
            wait_time = 2**attempt
            await asyncio.sleep(wait_time)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment