Heuristic:CrewAIInc CrewAI Context Window Management

Knowledge Sources	CrewAI LLM context window handling
Domains	LLM_Integration, Optimization
Last Updated	2026-02-11 17:00 GMT

Overview

Context window management strategy: automatic summarization when context is exceeded (enabled by default), LiteLLM noise filtering via stdout interception, and model-specific context size tracking.

Description

CrewAI manages LLM context windows through three mechanisms: (1) a `respect_context_window` flag on agents that automatically summarizes conversation history when it exceeds the model's context limit, (2) a `FilteredStream` that intercepts stdout/stderr to suppress noisy LiteLLM messages about context window limits, and (3) a hardcoded mapping of model names to context sizes ranging from 1,024 to 2,097,152 tokens. The summarization approach preserves semantic content rather than truncating, and the noise filter prevents LiteLLM's verbose warnings from cluttering terminal output.

Usage

Apply this heuristic when agents produce errors about context length or when terminal output is cluttered with LiteLLM messages. The `respect_context_window=True` default should be left enabled unless you explicitly want the LLM call to fail on context overflow (e.g., for testing). The FilteredStream is applied globally at module import time.

The Insight (Rule of Thumb)

Action: Leave `respect_context_window=True` (the default) on all agents
Value: Agents automatically summarize conversation history instead of failing when context is exceeded
Trade-off: Summarization loses some detail but prevents total failure; disabling it causes hard errors on context overflow
Noise Filter: The `FilteredStream` suppresses `"litellm.info:"` messages and `"Consider using a smaller input"` warnings from terminal output
Context Bounds: `MIN_CONTEXT = 1024` tokens, `MAX_CONTEXT = 2,097,152` tokens (Gemini 1.5 Pro)

Reasoning

Context window overflow is one of the most common failure modes in multi-agent systems where conversation history accumulates across multiple task executions. Automatic summarization is the only strategy that preserves semantic content while fitting within limits. Truncation would lose the most recent (often most relevant) context, and failing entirely would halt the workflow.

The LiteLLM noise filtering exists because LiteLLM produces excessive logging that confuses users. The CrewAI team couldn't disable LiteLLM's logging globally without affecting other functionality, so they implemented a thread-safe stream proxy that intercepts specific message patterns. This is defensive programming against a noisy-but-necessary dependency.

Code Evidence

Agent context window flag from `lib/crewai/src/crewai/agent/core.py:190-192`:

respect_context_window: bool = Field(
    default=True,
    description="Keep messages under the context window size by summarizing content.",
)

Context bounds from `lib/crewai/src/crewai/llm.py:185-186`:

MIN_CONTEXT: Final[int] = 1024
MAX_CONTEXT: Final[int] = 2097152  # Current max from gemini-1.5-pro

FilteredStream noise suppression from `lib/crewai/src/crewai/llm.py:119-141`:

class FilteredStream(io.TextIOBase):
    def write(self, s: str) -> int:
        with self._lock:
            lower_s = s.lower()
            # Skip common noisy LiteLLM banners
            if (
                "litellm.info:" in lower_s
                or "Consider using a smaller input or implementing a text splitting strategy"
                in lower_s
            ):
                return 0
            return self._original_stream.write(s)

# Applied globally at import time
if not isinstance(sys.stdout, FilteredStream):
    sys.stdout = FilteredStream(sys.stdout)
if not isinstance(sys.stderr, FilteredStream):
    sys.stderr = FilteredStream(sys.stderr)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment