Principle:Spotify Luigi Scheduler Daemon
Overview
A Scheduler Daemon is a long-running central service that coordinates task execution across distributed workers by maintaining a global view of the task dependency graph, assigning work, and persisting state across restarts.
Description
In production pipeline orchestration, tasks are submitted by multiple workers running on different machines. Without a central coordinator, each worker would have an incomplete picture of the overall pipeline state, leading to duplicated work, missed dependencies, and inconsistent failure handling.
A Scheduler Daemon solves this by running as a persistent background process (daemon) that:
- Maintains a global task graph: The scheduler holds the complete directed acyclic graph (DAG) of all registered tasks and their dependencies, enabling it to determine which tasks are ready to execute.
- Assigns work to workers: When a worker requests a task, the scheduler selects the highest-priority task whose dependencies have all been satisfied and whose required resources are available.
- Tracks task lifecycle: Every task progresses through a defined set of states (PENDING, RUNNING, DONE, FAILED, DISABLED), and the scheduler records each transition.
- Persists state to disk: The scheduler periodically serializes its in-memory state to a pickle file, allowing it to recover the task graph after a process restart without losing knowledge of completed or in-progress tasks.
- Prunes stale entries: A periodic pruning loop removes completed tasks that have exceeded their retention period, disconnected workers that have stopped sending heartbeats, and expired failure records.
- Exposes a REST API: The scheduler listens on a network port and accepts JSON-encoded RPC calls over HTTP, enabling language-agnostic communication with workers and monitoring tools.
This pattern separates the decision-making (what to run next) from the execution (actually running the task), enabling horizontal scaling of workers while keeping scheduling logic centralized and consistent.
Usage
Deploy a Scheduler Daemon when:
- Pipelines run across multiple machines or processes and need a single source of truth for task state.
- Tasks have complex dependency chains that require global coordination to resolve correctly.
- Production reliability demands that scheduler state survives process restarts and system reboots.
- Operations teams need visibility into pipeline state through a web interface or API.
- Resource constraints (e.g., limited database connections) require centralized allocation across workers.
Theoretical Basis
The Scheduler Daemon implements a centralized work-stealing scheduler with the following algorithmic properties:
- Task registration: Workers submit tasks to the scheduler via
add_taskRPC calls. Each task declaration includes the task identifier, its dependencies, priority, resource requirements, and the declaring worker. The scheduler inserts or updates the task in its internal graph. - Dependency resolution: When a worker calls
get_work, the scheduler traverses the task graph to find tasks in PENDING state whose dependencies are all in DONE state. Among eligible tasks, it selects the one with the highest priority. - Resource gating: Before assigning a task, the scheduler checks that the task's resource requirements can be satisfied given currently running tasks. If resources are insufficient, the task is skipped and the next eligible candidate is considered.
- Failure management: Tasks that fail are tracked against a configurable retry policy defined by three parameters:
retry_count(maximum failures before disabling),disable_window(time window for counting failures), anddisable_hard_timeout(maximum wall-clock time before forced disable). When a task exceeds its retry budget, the scheduler moves it to DISABLED state. - State persistence: The scheduler's in-memory state (all tasks, workers, and their relationships) is serialized to a pickle file at
state_path. On startup, the scheduler loads this file to restore the previous state. On shutdown (via SIGINT, SIGTERM, or SIGQUIT), the state is dumped before the process exits. - Periodic pruning: A timer (default every 60 seconds) triggers a pruning cycle that removes workers that have not sent a heartbeat within
worker_disconnect_delayseconds, tasks that have been completed for longer thanremove_delayseconds, and expired batch email records. - Daemonization: For production deployments, the scheduler process can detach from the terminal using the
python-daemonlibrary, redirecting stdout/stderr to log files and writing a PID file for process management.