Heuristic:PrefectHQ Prefect Concurrency Limit Scoping
| Knowledge Sources | |
|---|---|
| Domains | Concurrency, Resource_Management |
| Last Updated | 2026-02-09 22:00 GMT |
Overview
Scope Global Concurrency Limits (GCL) per worker using the `resource:{worker_id}` naming pattern to prevent resource contention across distributed workers.
Description
Prefect's Global Concurrency Limits (GCL) are identified by name strings. When multiple workers share a physical resource (e.g., GPU memory, database connections, file locks), each worker needs its own concurrency limit. The pattern `{resource}:{worker_id}` (e.g., `gpu:gpu-1`) creates worker-scoped limits that prevent tasks on one worker from consuming another worker's resource slots. The worker identity comes from the `WORKER_ID` environment variable. V1 concurrency limits use a ~100-year TTL for backward compatibility because old clients cannot maintain leases.
Usage
Apply this heuristic when deploying distributed task execution with shared physical resources. Relevant when multiple Prefect workers process GPU-bound, memory-bound, or I/O-bound tasks that need per-node resource throttling.
The Insight (Rule of Thumb)
- Action: Create GCL names using `{resource}:{worker_id}` pattern. Set `WORKER_ID` environment variable on each worker. Use `concurrency(name, occupy=1)` context manager around resource-intensive operations.
- Value: Example: `gpu:gpu-1` with `--limit 2` allows 2 concurrent GPU tasks per worker.
- Trade-off: Requires pre-creating GCL entries via CLI (`prefect gcl create gpu:gpu-1 --limit 2`) and setting `WORKER_ID` on each worker process.
- V1 Compatibility: Legacy V1 concurrency limits use a ~100-year TTL (`timedelta(days=100 * 365)`) because old clients cannot refresh leases.
Reasoning
Without per-worker scoping, a global concurrency limit of 2 would allow only 2 GPU tasks across all workers combined, severely underutilizing a multi-GPU cluster. By scoping limits to individual workers, each worker gets its own resource budget while the total cluster capacity scales linearly with worker count.
The `WORKER_ID` environment variable pattern avoids hardcoding worker identity and works naturally with container orchestrators (Kubernetes, Docker Compose) where each pod/container sets its own identity.
The 100-year TTL for V1 limits is a deliberate compatibility hack:
# From src/prefect/server/models/concurrency_limits.py:17-18
# Clients creating V1 limits can't maintain leases, so we use a long TTL
V1_LEASE_TTL = timedelta(days=100 * 365) # ~100 years
Usage pattern from `examples/per_worker_task_concurrency.py`:
WORKER_ID = os.getenv("WORKER_ID", "default")
@task
def run_ml_model(image_url: str, worker_id: str) -> dict:
with concurrency(f"gpu:{worker_id}", occupy=1):
logger.info(f"Running ML model on {image_url}")
time.sleep(3) # Simulate inference
return {"image_url": image_url, "result": "classified"}
Deployment commands:
# Create per-worker GPU limits
prefect gcl create gpu:gpu-1 --limit 2
prefect gcl create gpu:gpu-2 --limit 2
# Start workers with identity
WORKER_ID=gpu-1 prefect worker start --pool ml-pool --limit 10
WORKER_ID=gpu-2 prefect worker start --pool ml-pool --limit 10