Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Dagster io Dagster Rate Limiting Strategy

From Leeroopedia


Field Value
Principle Name Rate Limiting Strategy
Category Resilience
Domains Data_Engineering, API_Integration, Resilience
Repository dagster-io/dagster

Overview

Strategy for managing API rate limits in data pipelines through retry mechanisms, concurrency controls, and backoff policies.

Description

Rate limiting management combines multiple techniques to prevent API throttling:

  • Tenacity-based retry with fixed/exponential backoff for individual API calls that encounter rate limit errors (HTTP 429 or similar).
  • Dagster concurrency keys to limit parallel asset executions, ensuring only one asset with a given concurrency key runs at a time.
  • QueuedRunCoordinator to serialize runs globally, preventing multiple concurrent runs from overwhelming rate-limited APIs.

This multi-layered approach handles both per-request rate limits (individual API call throttling) and global rate limits (sessions per time window, concurrent connection limits).

Usage

Use when ingesting data from rate-limited APIs (social media, cloud services). Essential for any pipeline making high-volume API calls where rate limit violations could block data collection. Specific scenarios include:

  • Social media APIs -- Bluesky, Twitter/X, Reddit, and similar platforms with strict rate limits
  • Cloud service APIs -- AWS, GCP, Azure management APIs with per-second or per-minute quotas
  • SaaS APIs -- Salesforce, HubSpot, Stripe, and other services with tiered rate limits
  • Partitioned assets -- When many partitions of the same asset each make API calls, concurrency controls prevent all partitions from firing simultaneously

Theoretical Basis

Rate limiting management implements the circuit breaker and retry patterns. The tenacity library provides configurable retry logic (stop conditions, wait strategies, retry predicates). Dagster's concurrency keys implement cooperative scheduling, ensuring only one asset with a given key executes at a time. The QueuedRunCoordinator provides global concurrency control at the run level. Together, these form a hierarchical rate limiting strategy.

The hierarchy operates at three levels:

  • Request level -- Tenacity retries individual API calls with backoff, handling transient rate limit errors transparently.
  • Asset level -- Dagster concurrency keys (dagster/concurrency_key) ensure that only one asset with a given key executes at a time within a run, preventing parallel asset executions from exceeding limits.
  • Run level -- The QueuedRunCoordinator with default_op_concurrency_limit controls how many operations can run concurrently across all runs, providing a global ceiling.

Related Pages

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment