Principle:Dagster io Dagster Rate Limiting Strategy

Field	Value
Principle Name	Rate Limiting Strategy
Category	Resilience
Domains	Data_Engineering, API_Integration, Resilience
Repository	dagster-io/dagster

Overview

Strategy for managing API rate limits in data pipelines through retry mechanisms, concurrency controls, and backoff policies.

Description

Rate limiting management combines multiple techniques to prevent API throttling:

Tenacity-based retry with fixed/exponential backoff for individual API calls that encounter rate limit errors (HTTP 429 or similar).
Dagster concurrency keys to limit parallel asset executions, ensuring only one asset with a given concurrency key runs at a time.
QueuedRunCoordinator to serialize runs globally, preventing multiple concurrent runs from overwhelming rate-limited APIs.

This multi-layered approach handles both per-request rate limits (individual API call throttling) and global rate limits (sessions per time window, concurrent connection limits).

Usage

Use when ingesting data from rate-limited APIs (social media, cloud services). Essential for any pipeline making high-volume API calls where rate limit violations could block data collection. Specific scenarios include:

Social media APIs -- Bluesky, Twitter/X, Reddit, and similar platforms with strict rate limits
Cloud service APIs -- AWS, GCP, Azure management APIs with per-second or per-minute quotas
SaaS APIs -- Salesforce, HubSpot, Stripe, and other services with tiered rate limits
Partitioned assets -- When many partitions of the same asset each make API calls, concurrency controls prevent all partitions from firing simultaneously

Theoretical Basis

Rate limiting management implements the circuit breaker and retry patterns. The tenacity library provides configurable retry logic (stop conditions, wait strategies, retry predicates). Dagster's concurrency keys implement cooperative scheduling, ensuring only one asset with a given key executes at a time. The QueuedRunCoordinator provides global concurrency control at the run level. Together, these form a hierarchical rate limiting strategy.

The hierarchy operates at three levels:

Request level -- Tenacity retries individual API calls with backoff, handling transient rate limit errors transparently.
Asset level -- Dagster concurrency keys (dagster/concurrency_key) ensure that only one asset with a given key executes at a time within a run, preventing parallel asset executions from exceeding limits.
Run level -- The QueuedRunCoordinator with default_op_concurrency_limit controls how many operations can run concurrently across all runs, providing a global ceiling.

Related Pages

Implementation:Dagster_io_Dagster_Tenacity_Retry_Pattern

Uses Heuristic

Heuristic:Dagster_io_Dagster_Retry_Strategy_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment