Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Airflow Metrics Template

From Leeroopedia


Knowledge Sources
Domains Observability, Metrics
Last Updated 2026-02-08 21:00 GMT

Overview

A comprehensive YAML catalog of all Apache Airflow metric definitions, serving as the single source of truth for approximately 80 metrics spanning counters, gauges, and timers used throughout the platform's observability infrastructure.

Description

The metrics_template.yaml file defines the complete set of metrics that Airflow can emit. Each metric entry is a structured YAML object containing:

  • name -- The canonical metric name, optionally containing {variable} placeholders for dynamic tags.
  • description -- Human-readable explanation of what the metric tracks.
  • type -- One of counter, gauge, or timer.
  • legacy_name -- The older StatsD-style metric name for backward compatibility ("-" if no legacy equivalent exists).
  • name_variables -- A list of tag/label names that can be attached to the metric for dimensional querying.

The catalog contains approximately:

Type Count Purpose
counter ~37 Monotonically increasing values (e.g., task starts, failures, heartbeats)
gauge ~28 Point-in-time readings (e.g., pool slots, running tasks, queue sizes)
timer ~15 Duration measurements in milliseconds (e.g., task duration, scheduling delay)

Usage

This YAML file is consumed at startup to initialize the metric registry. It is not imported directly in Python code but is loaded by the observability subsystem to register all known metrics and their metadata. Downstream components such as SafeOtelLogger and StatsD-based loggers use the metric names defined here when emitting telemetry data.

Code Reference

Source Location

  • Repository: Apache_Airflow
  • File: shared/observability/src/airflow_shared/observability/metrics/metrics_template.yaml

Structure

The file follows a flat list structure under a top-level metrics key:

---
metrics:
  # ==========
  # Counters
  # ==========
  - name: "{job_name}_start"
    description: "Number of started ``{job_name}`` job, ex. ``SchedulerJob``, ``LocalTaskJob``"
    type: "counter"
    legacy_name: "-"
    name_variables: ["job_name"]

  - name: "operator_failures"
    description: "Operator ``{operator_name}`` failures."
    type: "counter"
    legacy_name: "operator_failures_{operator_name}"
    name_variables: ["operator_name"]

  # ==========
  # Gauges
  # ==========
  - name: "dagbag_size"
    description: "Number of Dags found when the scheduler ran a scan based on its configuration"
    type: "gauge"
    legacy_name: "-"
    name_variables: []

  - name: "pool.open_slots"
    description: "Number of open slots in the pool."
    type: "gauge"
    legacy_name: "pool.open_slots.{pool_name}"
    name_variables: ["pool_name"]

  # ==========
  # Timers
  # ==========
  - name: "task.duration"
    description: "Milliseconds taken to run a task"
    type: "timer"
    legacy_name: "dag.{dag_id}.{task_id}.duration"
    name_variables: ["dag_id", "task_id"]

  - name: "dagrun.schedule_delay"
    description: "Milliseconds of delay between the scheduled DagRun
    start date and the actual DagRun start date"
    type: "timer"
    legacy_name: "dagrun.schedule_delay.{dag_id}"
    name_variables: ["dag_id"]

I/O Contract

Inputs

Name Type Required Description
metrics list[dict] Yes A YAML list of metric definition objects
metrics[].name string Yes Canonical metric name; may contain {variable} placeholders
metrics[].description string Yes Human-readable description of the metric
metrics[].type string Yes One of: counter, gauge, timer
metrics[].legacy_name string Yes StatsD-era metric name or "-" if none exists
metrics[].name_variables list[string] Yes Tag/label variable names for dimensional metrics

Outputs

Name Type Description
Metric Registry Internal data structure Initialized set of metric definitions used by SafeOtelLogger and StatsD loggers
Validation metadata Per-metric metadata Name, type, and tag information used to validate metric emissions at runtime

Metric Categories

Counters (Selected)

Metric Name Tags Description
{job_name}_start job_name Number of started jobs
operator_failures operator_name Operator failures
ti_failures -- Overall task instance failures
scheduler_heartbeat -- Scheduler heartbeats
dag_processing.processes -- Relative number of running DAG parsing processes (UpDownCounter)
triggers.succeeded -- Number of triggers that fired at least one event
asset.updates -- Number of updated assets

Gauges (Selected)

Metric Name Tags Description
dagbag_size -- Number of DAGs found during scheduler scan
scheduler.tasks.starving -- Tasks that cannot be scheduled due to no open pool slots
pool.open_slots pool_name Open slots in a pool
triggers.running hostname Running triggers per triggerer
ti.running queue, dag_id, task_id Running task instances

Timers (Selected)

Metric Name Tags Description
task.duration dag_id, task_id Milliseconds to run a task
dagrun.duration.success dag_id Milliseconds for DagRun to reach success
dagrun.schedule_delay dag_id Scheduling delay in milliseconds
scheduler.scheduler_loop_duration -- Milliseconds per scheduler loop

Usage Examples

Referencing a Metric Name in OTel Logger

# The metric names from the YAML template are used when emitting metrics:
from airflow_shared.observability.metrics.otel_logger import SafeOtelLogger

# Increment a counter defined in the template
logger.incr("operator_failures", tags={"operator_name": "BashOperator"})

# Set a gauge defined in the template
logger.gauge("pool.open_slots", value=5, tags={"pool_name": "default_pool"})

# Record a timer defined in the template
logger.timing("task.duration", dt=1234.5, tags={"dag_id": "my_dag", "task_id": "my_task"})

Loading the YAML Template

import yaml
from pathlib import Path

template_path = Path(__file__).parent / "metrics_template.yaml"
with open(template_path) as f:
    metrics_catalog = yaml.safe_load(f)

for metric in metrics_catalog["metrics"]:
    print(f"{metric['type']:>8s} | {metric['name']}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment