Heuristic:BerriAI Litellm Connection Pooling Memory Management

Knowledge Sources	BerriAI/litellm Python SSL leak
Domains	Infrastructure, Optimization
Last Updated	2026-02-15 16:00 GMT

Overview

Memory management heuristic using bounded connection pools (300 total, 50 per host), aggressive queue limits, and Python-version-specific SSL leak mitigation.

Description

LiteLLM's proxy server handles high volumes of outbound HTTP connections to LLM providers. Without connection pooling limits, the aiohttp connector can grow unboundedly, causing memory leaks. This heuristic defines the connection pool bounds, keep-alive timeouts, DNS cache TTL, and a version-specific workaround for a Python SSL memory leak. It also covers in-memory cache bounding and async queue size limits to prevent runaway memory usage.

Usage

Apply this heuristic when deploying the LiteLLM proxy in production with high traffic volumes. The defaults are tuned for medium-to-large deployments. For very high traffic (>10K RPM), consider increasing `AIOHTTP_CONNECTOR_LIMIT` but monitor memory usage carefully.

The Insight (Rule of Thumb)

Connection Pool Limits:
- `AIOHTTP_CONNECTOR_LIMIT=300` (total connections across all hosts)
- `AIOHTTP_CONNECTOR_LIMIT_PER_HOST=50` (max connections to a single provider)
- `AIOHTTP_KEEPALIVE_TIMEOUT=120` (2-minute keep-alive)
- `AIOHTTP_TTL_DNS_CACHE=300` (5-minute DNS cache)
Memory Cache Bounds:
- Max 200 items in in-memory cache
- Max 1MB per cached item
- 10-minute default TTL (600 seconds)
Queue Limits:
- `MAX_SIZE_IN_MEMORY_QUEUE=2000`
- `LITELLM_ASYNCIO_QUEUE_MAXSIZE=1000`
- `MAX_CALLBACKS=100` (prevent exponential callback growth)
Python SSL Fix: `enable_cleanup_closed` needed for Python < 3.12.7 or 3.13.0 (before 3.13.1)
Trade-off: Lower limits use less memory but may throttle throughput under high concurrency; higher limits risk memory growth.

Reasoning

Connection pooling bounds prevent a common production issue where each LLM request opens a new TCP connection that is never properly closed, leading to file descriptor exhaustion and memory growth. The 300-connection limit was chosen to handle typical multi-provider proxy traffic without hitting OS-level socket limits. The 50 per-host limit prevents a single slow provider from consuming the entire pool. The Python SSL leak fix addresses a specific CPython bug where SSL objects are not properly garbage collected, causing steady memory growth under sustained HTTPS traffic.

Code Evidence

Connection pooling constants from `litellm/constants.py:165-180`:

# Aiohttp connection pooling - prevents memory leaks from unbounded connection growth
AIOHTTP_CONNECTOR_LIMIT = int(os.getenv("AIOHTTP_CONNECTOR_LIMIT", 300))
AIOHTTP_CONNECTOR_LIMIT_PER_HOST = int(
    os.getenv("AIOHTTP_CONNECTOR_LIMIT_PER_HOST", 50)
)
AIOHTTP_KEEPALIVE_TIMEOUT = int(os.getenv("AIOHTTP_KEEPALIVE_TIMEOUT", 120))
AIOHTTP_TTL_DNS_CACHE = int(os.getenv("AIOHTTP_TTL_DNS_CACHE", 300))
# enable_cleanup_closed is only needed for Python versions with the SSL leak bug
# Fixed in Python 3.12.7+ and 3.13.1+ (see https://github.com/python/cpython/pull/118960)
AIOHTTP_NEEDS_CLEANUP_CLOSED = (3, 13, 0) <= sys.version_info < (
    3, 13, 1,
) or sys.version_info < (3, 12, 7)

In-memory cache bounds from `litellm/caching/in_memory_cache.py:31-45`:

default_ttl: int = 600  # 10 minutes
# At maximum litellm rate limiting logic requires objects to be in memory for 1 minute
max_size_in_memory: int = 200  # upper bound to prevent memory leaks

Callback limit from `litellm/constants.py:129-132`:

# Maximum number of callbacks that can be registered
# This prevents callbacks from exponentially growing and consuming CPU resources
# Override with LITELLM_MAX_CALLBACKS env var for large deployments
MAX_CALLBACKS = get_env_int("LITELLM_MAX_CALLBACKS", 100)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment