Workflow:Spotify Luigi Central Scheduler Deployment
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Infrastructure, Pipeline_Orchestration |
| Last Updated | 2026-02-10 12:00 GMT |
Overview
End-to-end process for deploying and operating Luigi's central scheduler daemon (luigid) for production pipeline orchestration, task locking, and web-based visualization.
Description
This workflow covers the transition from local-scheduler development mode to a production-grade centralized scheduler. The central scheduler (luigid) is a Tornado-based HTTP server that provides distributed task locking (preventing duplicate task execution across workers), a web-based visualiser for monitoring pipeline status and dependency graphs, task history tracking, and a REST API for programmatic interaction. Multiple worker processes connect to the single scheduler to coordinate work.
Usage
Execute this workflow when transitioning Luigi pipelines from development to production. The central scheduler is required when multiple workers or cron jobs may run the same pipeline concurrently, when you need visibility into pipeline status via the web UI, or when task history and execution tracking are needed for operational monitoring.
Execution Steps
Step 1: Install and Configure Luigi
Install Luigi with its dependencies (Tornado for the web server). Configure the scheduler section in luigi.cfg or luigi.toml, setting parameters such as state persistence path (for crash recovery), task history retention, worker disconnect timeouts, and resource limits. Optionally configure database-backed task history via DbTaskHistory for long-term storage.
Key considerations:
- Tornado is required for the central scheduler web server
- state_path configures where the scheduler persists its in-memory state to disk
- record_task_history enables storing task execution events in a database
- retry_delay and retry_count control automatic task retries on failure
- disable_persist_sec controls how long a failing task is disabled
Step 2: Start the Scheduler Daemon
Launch the central scheduler using the luigid command. The daemon starts a Tornado HTTP server on the configured port (default 8082) and serves both the REST API and the web visualiser. The scheduler can run as a foreground process, a background daemon (--background), or be managed by a process supervisor (systemd, supervisord).
Key considerations:
- luigid starts the scheduler on port 8082 by default
- --background flag daemonizes the process with a PID file and log file
- --port overrides the default port; --address binds to a specific interface
- Unix socket mode is available for local-only deployments via --unix-socket
- The scheduler should be monitored by a process supervisor in production
Step 3: Configure Workers to Connect
Update Luigi pipeline scripts to connect to the central scheduler by removing the --local-scheduler flag. Configure the [core] section with scheduler_host and scheduler_port pointing to the luigid instance. Workers communicate with the scheduler via JSON-RPC over HTTP, registering tasks and polling for work.
Key considerations:
- Remove --local-scheduler from CLI invocations to use the central scheduler
- [core] scheduler_host and scheduler_port configure the connection
- Workers authenticate implicitly; no authentication is built in
- Multiple workers can connect concurrently; the scheduler provides task-level locking
- Workers send heartbeats; the scheduler removes stale workers after timeout
Step 4: Monitor via Web Visualiser
Access the web visualiser at http://scheduler_host:8082 to view running tasks, dependency graphs, worker status, and execution history. The visualiser shows task tables with filtering, SVG dependency graph rendering via D3.js/dagre-d3, resource utilization, and worker management. Tasks are color-coded by status (green = complete, yellow = pending, red = failed).
Key considerations:
- The visualiser is served from the same port as the REST API
- Dependency graphs render SVG using D3.js and dagre-d3
- Task tables support filtering by status, name pattern, and time range
- Resource usage is visualized to identify bottlenecks
- CORS can be configured for cross-origin API access
Step 5: Configure Persistence and History
Enable state persistence so the scheduler can recover its task graph after a restart. Optionally enable database-backed task history (DbTaskHistory with SQLAlchemy) for long-term execution tracking, auditing, and analytics. Configure log rotation and retention policies for the scheduler logs.
Key considerations:
- state_path enables pickle-based state persistence to disk
- DbTaskHistory stores events in a relational database (SQLite, PostgreSQL, etc.)
- Task history records scheduled, started, completed, and failed events with timestamps
- Scheduler state is periodically checkpointed; recovery loads the last checkpoint
- Log files should be rotated to prevent disk space exhaustion