Workflow:PrefectHQ Prefect Per Worker Task Concurrency

Knowledge Sources	Prefect Prefect Docs Global Concurrency Limits
Domains	Infrastructure, Concurrency_Control, ML_Ops
Last Updated	2026-02-09 22:00 GMT

Overview

End-to-end process for using Prefect Global Concurrency Limits scoped per worker to control how many tasks can simultaneously consume a shared local resource (such as a GPU), while allowing non-resource-bound tasks to run freely in parallel.

Description

This workflow addresses the problem of resource contention when a worker runs multiple flow runs concurrently. Rather than limiting entire flow runs to run sequentially (which wastes throughput), it applies fine-grained concurrency limits only to the specific tasks that consume scarce resources. Global Concurrency Limits are coordinated by the Prefect server and work across the separate subprocesses that each flow run executes in. By including a worker identifier in the limit name, each machine maintains independent limits.

Key outputs:

Processed results from a multi-step pipeline where the resource-intensive step is rate-limited
Maximum throughput for non-resource-bound steps while protecting scarce resources

Scope:

From work pool and worker configuration through task-level concurrency control
Applicable to GPU memory, software licenses, local services, or any shared resource

Usage

Execute this workflow pattern when you have a multi-step pipeline where only certain tasks need concurrency limits (e.g., GPU inference, licensed software, memory-intensive processing) and you want to maximize throughput for all other tasks. It is suitable for ML inference pipelines, image processing, and any scenario where a scarce local resource must be shared across concurrent flow runs on the same machine.

Execution Steps

Step 1: Create Global Concurrency Limits

Create a Global Concurrency Limit for each worker machine using the Prefect CLI. The limit name includes the worker identity (e.g., gpu:gpu-1) and the limit value controls how many tasks can acquire the resource simultaneously.

Key considerations:

Each worker machine gets its own independently-managed limit
The limit name convention ({resource}:{worker_id}) ensures isolation
Limit values should match the machine's resource capacity

Step 2: Configure Work Pool and Deployment

Create a work pool (e.g., process type) and deploy the flow. The work pool's concurrency limit controls how many total flow runs can execute on a worker, while the GCL controls the specific resource-bound step.

Key considerations:

The work pool limit (e.g., 10 concurrent flow runs) is separate from the task-level GCL
Deploy the flow using prefect deploy to make it available for scheduling

Step 3: Start Workers with Identity

Start workers with a unique WORKER_ID environment variable that matches the GCL name. This identity links the runtime worker process to its corresponding concurrency limit.

Key considerations:

The WORKER_ID environment variable is read at runtime by the task
Each worker must have a GCL created with a matching name
Workers can handle many concurrent flow runs via the --limit flag

Step 4: Execute Non-Limited Tasks Freely

Tasks that do not consume the scarce resource (e.g., downloading data, saving results) run without any concurrency gate. They execute in parallel across all concurrent flow runs on the worker.

Key considerations:

Network-bound and I/O-bound tasks should not acquire concurrency limits
These tasks overlap freely to maximize throughput

Step 5: Acquire Concurrency Limit for Resource-Bound Task

The resource-intensive task (e.g., ML model inference) acquires the per-worker Global Concurrency Limit using the concurrency context manager before executing. The server coordinates slot acquisition across all subprocesses on the same worker.

Key considerations:

The concurrency context manager blocks until a slot is available
The slot is released when the context manager exits (success or failure)
The occupy parameter controls how many slots each invocation consumes

Step 6: Return Results

Once the resource-bound task completes and releases its concurrency slot, downstream tasks (e.g., saving results) run immediately. The next queued flow run's resource-bound task can then acquire the freed slot.

Key considerations:

Results flow through the pipeline without additional coordination
The pattern maximizes overall throughput while protecting the scarce resource

Execution Diagram

GitHub URL

Workflow Repository