Principle:Spotify Luigi Web Visualiser Monitoring
Overview
Web Visualiser Monitoring is the practice of providing real-time visual monitoring of pipeline execution state through a browser-based dashboard backed by a REST API that exposes task status, dependency graphs, and execution history.
Description
When orchestrating complex data pipelines with hundreds or thousands of tasks, operators need immediate visibility into what is running, what has failed, and what is blocked. Terminal-based log output is insufficient for understanding the state of a large DAG at a glance.
A Web Visualiser addresses this by providing:
- Task status dashboard: A tabular view of all tasks organized by status (PENDING, RUNNING, DONE, FAILED, DISABLED, BATCH_RUNNING), with counts for each category and the ability to filter, search, and sort.
- Dependency graph visualization: An interactive directed acyclic graph (DAG) rendering that shows tasks as color-coded nodes connected by dependency edges. This makes it possible to trace the causal chain from a failed task back through its upstream dependencies.
- Task detail inspection: Clicking on a task reveals its parameters, execution time, error traces (for failed tasks), tracking URLs, status messages, and progress percentages.
- Operational actions: The dashboard can expose actions such as re-enabling disabled tasks, marking tasks as done, and forgiving failures, allowing operators to intervene without command-line access.
- REST API for integration: All data shown in the dashboard is sourced from a JSON-over-HTTP API, enabling external monitoring tools (Nagios, Grafana, PagerDuty) to query pipeline state programmatically.
- Task execution history: Historical views show how task running times have changed over time, enabling performance regression detection and capacity planning.
This pattern follows the observability pillar of production systems: the ability to understand the internal state of a system from its external outputs. By making pipeline state visible and actionable, operators can detect and resolve issues before they cascade through the dependency graph.
Usage
Use web-based pipeline monitoring when:
- Pipelines involve complex dependency chains where failure propagation must be understood visually.
- Multiple operators need simultaneous visibility into pipeline state without SSH access to scheduler machines.
- Operational teams need to perform corrective actions (re-enable tasks, mark as done) through a web interface.
- External monitoring systems need to query pipeline health via API.
- Historical execution trends need to be tracked for capacity planning and SLA monitoring.
Theoretical Basis
A Web Visualiser Monitoring system operates on three layers:
- Data layer (REST API): The scheduler exposes its internal state through a set of JSON endpoints. Each endpoint maps to a scheduler method:
task_listreturns tasks filtered by status,dep_graphreturns the dependency subgraph rooted at a given task,fetch_errorreturns the error trace for a failed task, and so on. This layer decouples the data from the presentation, enabling both human-readable dashboards and machine-readable integrations. - Presentation layer (Web UI): A single-page JavaScript application fetches data from the REST API and renders it using:
- Tabular views: DataTables-based filterable, sortable tables for task lists with real-time status counts.
- Graph views: SVG-based or D3-rendered DAG visualizations where node color encodes task status (red for FAILED, green for DONE, blue for RUNNING, yellow for PENDING, gray for DISABLED).
- Detail panels: Modal or inline panels showing task parameters, error traces, timing, and progress.
- Interaction layer (Actions): The UI can POST operational commands back to the scheduler API, such as
re_enable_task,mark_as_done, andforgive_failures. These mutations immediately update the scheduler's in-memory state and are reflected in the next UI poll cycle.
The polling model (periodic AJAX requests rather than WebSocket push) is simpler to implement and sufficient for pipeline monitoring where sub-second latency is not required. Typical poll intervals of 1-10 seconds provide adequate real-time feedback for human operators.