Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Langgenius Dify Celery Queue Separation

From Leeroopedia
Knowledge Sources
Domains Optimization, Infrastructure
Last Updated 2026-02-12 08:00 GMT

Overview

Task queue architecture tip: separate Celery queues by task priority and type (dataset, workflow, mail, etc.) to prevent low-priority tasks from starving critical operations.

Description

Dify's Celery worker processes 20+ distinct task queues covering API token management, dataset indexing, pipeline processing, mail delivery, workflow execution, plugin operations, and more. The Community edition (self-hosted) uses a single set of queues, while the Cloud edition splits workflow queues further (professional, team, sandbox tiers) for multi-tenant isolation.

For Kubernetes deployments, dedicated workers can be assigned to specific queues using the `CELERY_WORKER_QUEUES` environment variable, enabling horizontal scaling of bottleneck queues independently.

Usage

Apply this heuristic when deploying Dify at scale, diagnosing task processing delays, or setting up Kubernetes-based deployments where different task types need independent scaling.

The Insight (Rule of Thumb)

  • Action: Use `CELERY_WORKER_QUEUES` to assign dedicated workers to specific queues in Kubernetes deployments. Use `CELERY_AUTO_SCALE=true` with `CELERY_MAX_WORKERS` and `CELERY_MIN_WORKERS` for dynamic scaling.
  • Value: Community edition default queues: `api_token,dataset,priority_dataset,priority_pipeline,pipeline,mail,ops_trace,app_deletion,plugin,workflow_storage,conversation,workflow,schedule_poller,schedule_executor,triggered_workflow_dispatcher,trigger_refresh_executor,retention,workflow_based_app_execution`
  • Trade-off: More workers consume more memory. Each dedicated queue worker needs at least 1 process. Over-splitting reduces throughput for bursty workloads.

Key configuration variables:

  • `CELERY_WORKER_QUEUES`: Comma-separated list of queues (overrides default)
  • `CELERY_WORKER_CONCURRENCY`: Number of worker processes per container
  • `CELERY_WORKER_POOL`: Pool implementation (gevent, prefork, solo)
  • `CELERY_PREFETCH_MULTIPLIER`: Tasks prefetched per worker (default: 1)
  • `MAX_TASKS_PER_CHILD`: Tasks before worker restart (default: 50, prevents memory leaks)

Reasoning

Without queue separation, a burst of dataset indexing tasks (which are CPU and I/O intensive) could block mail delivery or workflow execution tasks from being processed. Priority queues (`priority_dataset`, `priority_pipeline`) ensure user-initiated operations take precedence over background batch operations.

The `MAX_TASKS_PER_CHILD=50` setting restarts worker processes after 50 tasks to prevent memory leaks from long-running ML operations (embedding generation, document parsing).

Code evidence from `api/docker/entrypoint.sh:34-45`:

# Configure queues based on edition if not explicitly set
if [[ -z "${CELERY_QUEUES}" ]]; then
  if [[ "${EDITION}" == "CLOUD" ]]; then
    DEFAULT_QUEUES="api_token,dataset,priority_dataset,priority_pipeline,pipeline,mail,ops_trace,app_deletion,plugin,workflow_storage,conversation,workflow_professional,workflow_team,workflow_sandbox,schedule_poller,schedule_executor,triggered_workflow_dispatcher,trigger_refresh_executor,retention,workflow_based_app_execution"
  else
    DEFAULT_QUEUES="api_token,dataset,priority_dataset,priority_pipeline,pipeline,mail,ops_trace,app_deletion,plugin,workflow_storage,conversation,workflow,schedule_poller,schedule_executor,triggered_workflow_dispatcher,trigger_refresh_executor,retention,workflow_based_app_execution"
  fi
fi

Kubernetes-specific queue override from `api/docker/entrypoint.sh:47-56`:

# Support for Kubernetes deployment with specific queue workers
# Environment variables that can be set:
# - CELERY_WORKER_QUEUES: Comma-separated list of queues (overrides CELERY_QUEUES)
# - CELERY_WORKER_CONCURRENCY: Number of worker processes (overrides CELERY_WORKER_AMOUNT)
# - CELERY_WORKER_POOL: Pool implementation (overrides CELERY_WORKER_CLASS)

if [[ -n "${CELERY_WORKER_QUEUES}" ]]; then
  DEFAULT_QUEUES="${CELERY_WORKER_QUEUES}"
fi

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment