Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Anthropics Anthropic sdk python Streaming For Long Requests

From Leeroopedia
Knowledge Sources
Domains API_Client, Optimization
Last Updated 2026-02-15 12:00 GMT

Overview

The SDK enforces streaming for non-streaming requests expected to exceed 10 minutes, raising a ValueError if max_tokens implies a long operation.

Description

The Anthropic Python SDK calculates the expected duration of a non-streaming request based on the max_tokens parameter and a reference throughput of 128,000 tokens per hour. If the estimated duration exceeds the default 10-minute timeout, or if max_tokens exceeds the model-specific non-streaming token limit, the SDK raises a ValueError before the request is even sent. This prevents silent failures caused by network infrastructure dropping idle connections during long API calls.

Usage

Apply this heuristic when making non-streaming API calls with high max_tokens values (especially above ~21,000 tokens). If you encounter the ValueError about streaming being required, switch to stream=True or use the client.messages.stream() context manager.

The Insight (Rule of Thumb)

  • Action: Use stream=True or client.messages.stream() for any request that might generate large outputs.
  • Value: The threshold is approximately 21,333 tokens (128,000 * 10min / 60min). Above this, the SDK forces streaming.
  • Trade-off: Streaming requires handling incremental events instead of a single response object. Use get_final_message() on the stream if you only need the complete result.
  • Model-specific limits: Claude Opus 4 models have a hardcoded non-streaming limit of 8,192 tokens regardless of timing calculation.

Reasoning

Network infrastructure (load balancers, proxies, firewalls) commonly drops idle TCP connections after extended periods. A non-streaming HTTP request to the Anthropic API keeps the connection idle while the model generates output. The SDK configures TCP keep-alive (60-second intervals, 5 retries) to mitigate this, but it is not always sufficient for very long requests. Streaming solves this by sending incremental data over the connection, keeping it active.

The formula used is: expected_time = 3600 * max_tokens / 128,000. If this exceeds 600 seconds (10 minutes), or if max_tokens exceeds the model-specific non-streaming limit, the SDK raises immediately.

Code Evidence

Non-streaming timeout calculation from _base_client.py:726-739:

def _calculate_nonstreaming_timeout(self, max_tokens: int, max_nonstreaming_tokens: int | None) -> Timeout:
    maximum_time = 60 * 60
    default_time = 60 * 10

    expected_time = maximum_time * max_tokens / 128_000
    if expected_time > default_time or (max_nonstreaming_tokens and max_tokens > max_nonstreaming_tokens):
        raise ValueError(
            "Streaming is required for operations that may take longer than 10 minutes. "
            + "See https://github.com/anthropics/anthropic-sdk-python#long-requests for more details",
        )
    return Timeout(
        default_time,
        connect=5.0,
    )

Model-specific non-streaming token limits from _constants.py:20-29:

MODEL_NONSTREAMING_TOKENS = {
    "claude-opus-4-20250514": 8_192,
    "claude-opus-4-0": 8_192,
    "claude-4-opus-20250514": 8_192,
    "anthropic.claude-opus-4-20250514-v1:0": 8_192,
    "claude-opus-4@20250514": 8_192,
    "claude-opus-4-1-20250805": 8192,
    "anthropic.claude-opus-4-1-20250805-v1:0": 8192,
    "claude-opus-4-1@20250805": 8192,
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment