Heuristic:Togethercomputer Together python Retry Backoff Strategy

Knowledge Sources	Together Python SDK MDN Retry-After
Domains	Networking, Reliability
Last Updated	2026-02-15 16:00 GMT

Overview

Exponential backoff with jitter strategy for handling API rate limits and transient failures, starting at 0.5s and capped at 8s.

Description

The Together SDK implements a retry strategy that combines server-guided delays (via `Retry-After` headers) with exponential backoff and randomized jitter. When the API returns a retryable error (rate limit, server error), the SDK waits for an increasing delay before retrying, up to 5 attempts. The jitter prevents multiple clients from retrying simultaneously (thundering herd problem).

Usage

This heuristic applies automatically to all API calls made through the `Together()` or `AsyncTogether()` client. Understand this pattern when:

Debugging slow API calls that involve retries
Tuning `max_retries` or `timeout` parameters
Building applications that need predictable latency

The Insight (Rule of Thumb)

Action: The SDK automatically retries failed requests with exponential backoff + jitter.
Value: 5 max retries; delay starts at 0.5s, doubles each retry, capped at 8s; 25% random jitter applied.
Trade-off: More retries increase reliability but add latency. A fully exhausted retry sequence takes ~16s of waiting.
Server Override: If the API returns a `Retry-After` header with a value <= 60s, the SDK respects it instead of calculating its own delay.

Retry timing sequence (without server override):

Retry 1: ~0.5s (range: 0.375s - 0.5s)
Retry 2: ~1.0s (range: 0.75s - 1.0s)
Retry 3: ~2.0s (range: 1.5s - 2.0s)
Retry 4: ~4.0s (range: 3.0s - 4.0s)
Retry 5: ~8.0s (range: 6.0s - 8.0s)

Session management:

HTTP sessions are thread-local and recycled every 180 seconds to prevent connection staleness.
Each session has 2 connection-level retries (urllib3 HTTPAdapter) in addition to the 5 application-level retries.

Reasoning

Exponential backoff prevents overwhelming a rate-limited API. The jitter (25% variance via `1 - 0.25 * random()`) prevents synchronized retry storms when multiple clients hit rate limits simultaneously. The server-guided delay (`Retry-After` header) takes priority because the server has the best knowledge of when capacity will be available. The 60-second cap on respecting `Retry-After` prevents a malformed server response from causing indefinite waits.

Session recycling every 3 minutes prevents issues with stale TCP connections in long-running processes, while still benefiting from connection reuse for short bursts of requests.

Constants from `src/together/constants.py:5-10`:

TIMEOUT_SECS = 600          # 10-minute request timeout
MAX_SESSION_LIFETIME_SECS = 180  # 3-minute session lifetime
MAX_CONNECTION_RETRIES = 2   # urllib3-level retries
MAX_RETRIES = 5              # Application-level retries
INITIAL_RETRY_DELAY = 0.5   # Starting backoff delay
MAX_RETRY_DELAY = 8.0       # Maximum backoff delay

Backoff calculation from `src/together/abstract/api_requestor.py:152-170`:

def _calculate_retry_timeout(
    self, remaining_retries, response_headers=None,
) -> float:
    retry_after = self._parse_retry_after_header(response_headers)
    if retry_after is not None and 0 < retry_after <= 60:
        return retry_after

    nb_retries = self.retries - remaining_retries
    sleep_seconds = min(INITIAL_RETRY_DELAY * pow(2.0, nb_retries), MAX_RETRY_DELAY)
    jitter = 1 - 0.25 * random()
    timeout = sleep_seconds * jitter
    return timeout if timeout >= 0 else 0

Session recycling from `src/together/abstract/api_requestor.py:478-487`:

if not hasattr(_thread_context, "session"):
    _thread_context.session = _make_session(MAX_CONNECTION_RETRIES)
    _thread_context.session_create_time = time.time()
elif (
    time.time() - getattr(_thread_context, "session_create_time", 0)
    >= MAX_SESSION_LIFETIME_SECS
):
    _thread_context.session.close()
    _thread_context.session = _make_session(MAX_CONNECTION_RETRIES)
    _thread_context.session_create_time = time.time()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment