Principle:Spotify Luigi Metrics Collection

Knowledge Sources	Spotify_Luigi Luigi Docs
Domains	Monitoring, Observability
Last Updated	2026-02-10 08:00 GMT

Overview

Collecting and exporting task execution metrics to monitoring systems for operational visibility and performance tracking.

Description

Metrics collection is the practice of instrumenting a pipeline orchestrator to emit quantitative measurements about task execution to external monitoring and alerting systems. As pipelines grow in complexity and criticality, operators need real-time insight into how tasks are performing: how long they take, how often they fail, how many are queued, and what resources they consume. Metrics collection bridges the gap between the pipeline's internal state and external observability platforms (such as Datadog, Prometheus, Graphite, or StatsD) that provide dashboards, alerting, and historical trend analysis. By exporting structured metrics, the pipeline becomes a first-class citizen in the organization's monitoring infrastructure.

Usage

Use metrics collection when pipelines run in production and require operational monitoring, when SLAs must be tracked and alerted on, when performance trends need to be analyzed over time, or when pipeline health must be visible alongside other infrastructure metrics in a unified monitoring dashboard.

Theoretical Basis

Metrics collection follows the observer pattern applied to pipeline execution events. The theoretical framework encompasses metric types, collection points, and export mechanisms:

1. Metric Types -- Pipeline metrics fall into standard categories:
   * Counters -- Monotonically increasing values tracking cumulative occurrences (tasks started, tasks completed, tasks failed)
   * Gauges -- Point-in-time values representing current state (tasks currently running, queue depth)
   * Timers/Histograms -- Duration measurements capturing the distribution of execution times (task duration, scheduling latency)
2. Collection Points -- Metrics are captured at key lifecycle events:
   * Task scheduled (counter increment, gauge update)
   * Task started (timer start, gauge update)
   * Task completed successfully (counter increment, timer stop, gauge update)
   * Task failed (failure counter increment, gauge update)
   * Task disabled after repeated failures (counter increment)
3. Dimensional Labeling -- Each metric is annotated with dimensions (also called tags or labels) that enable filtering and grouping:
   * Task family (the type of task)
   * Task status (success, failure, pending)
   * Worker identifier
   * Host name
4. Export Mechanism -- Metrics are transmitted to the monitoring backend through one of two models:
   * Push model -- The pipeline periodically sends metric batches to a collection endpoint (StatsD, Datadog agent)
   * Pull model -- The pipeline exposes a metrics endpoint that the monitoring system scrapes at regular intervals (Prometheus)
5. Aggregation -- The monitoring backend aggregates individual metric data points over configurable time windows, enabling queries such as "average task duration over the last hour" or "95th percentile execution time over the last day."

The fundamental principle is separation of concerns: the pipeline emits raw metric events, and the monitoring system handles storage, aggregation, visualization, and alerting.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment