Heuristic:PrefectHQ Prefect Concurrency Limit Scoping

Knowledge Sources	PrefectHQ/prefect Per Worker Task Concurrency Example
Domains	Concurrency, Resource_Management
Last Updated	2026-02-09 22:00 GMT

Overview

Scope Global Concurrency Limits (GCL) per worker using the `resource:{worker_id}` naming pattern to prevent resource contention across distributed workers.

Description

Prefect's Global Concurrency Limits (GCL) are identified by name strings. When multiple workers share a physical resource (e.g., GPU memory, database connections, file locks), each worker needs its own concurrency limit. The pattern `{resource}:{worker_id}` (e.g., `gpu:gpu-1`) creates worker-scoped limits that prevent tasks on one worker from consuming another worker's resource slots. The worker identity comes from the `WORKER_ID` environment variable. V1 concurrency limits use a ~100-year TTL for backward compatibility because old clients cannot maintain leases.

Usage

Apply this heuristic when deploying distributed task execution with shared physical resources. Relevant when multiple Prefect workers process GPU-bound, memory-bound, or I/O-bound tasks that need per-node resource throttling.

The Insight (Rule of Thumb)

Action: Create GCL names using `{resource}:{worker_id}` pattern. Set `WORKER_ID` environment variable on each worker. Use `concurrency(name, occupy=1)` context manager around resource-intensive operations.
Value: Example: `gpu:gpu-1` with `--limit 2` allows 2 concurrent GPU tasks per worker.
Trade-off: Requires pre-creating GCL entries via CLI (`prefect gcl create gpu:gpu-1 --limit 2`) and setting `WORKER_ID` on each worker process.
V1 Compatibility: Legacy V1 concurrency limits use a ~100-year TTL (`timedelta(days=100 * 365)`) because old clients cannot refresh leases.

Reasoning

Without per-worker scoping, a global concurrency limit of 2 would allow only 2 GPU tasks across all workers combined, severely underutilizing a multi-GPU cluster. By scoping limits to individual workers, each worker gets its own resource budget while the total cluster capacity scales linearly with worker count.

The `WORKER_ID` environment variable pattern avoids hardcoding worker identity and works naturally with container orchestrators (Kubernetes, Docker Compose) where each pod/container sets its own identity.

The 100-year TTL for V1 limits is a deliberate compatibility hack:

# From src/prefect/server/models/concurrency_limits.py:17-18
# Clients creating V1 limits can't maintain leases, so we use a long TTL
V1_LEASE_TTL = timedelta(days=100 * 365)  # ~100 years

Usage pattern from `examples/per_worker_task_concurrency.py`:

WORKER_ID = os.getenv("WORKER_ID", "default")

@task
def run_ml_model(image_url: str, worker_id: str) -> dict:
    with concurrency(f"gpu:{worker_id}", occupy=1):
        logger.info(f"Running ML model on {image_url}")
        time.sleep(3)  # Simulate inference
        return {"image_url": image_url, "result": "classified"}

Deployment commands:

# Create per-worker GPU limits
prefect gcl create gpu:gpu-1 --limit 2
prefect gcl create gpu:gpu-2 --limit 2

# Start workers with identity
WORKER_ID=gpu-1 prefect worker start --pool ml-pool --limit 10
WORKER_ID=gpu-2 prefect worker start --pool ml-pool --limit 10

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment