Overview
A comprehensive YAML catalog of all Apache Airflow metric definitions, serving as the single source of truth for approximately 80 metrics spanning counters, gauges, and timers used throughout the platform's observability infrastructure.
Description
The metrics_template.yaml file defines the complete set of metrics that Airflow can emit. Each metric entry is a structured YAML object containing:
- name -- The canonical metric name, optionally containing
{variable} placeholders for dynamic tags.
- description -- Human-readable explanation of what the metric tracks.
- type -- One of
counter, gauge, or timer.
- legacy_name -- The older StatsD-style metric name for backward compatibility (
"-" if no legacy equivalent exists).
- name_variables -- A list of tag/label names that can be attached to the metric for dimensional querying.
The catalog contains approximately:
| Type |
Count |
Purpose
|
| counter |
~37 |
Monotonically increasing values (e.g., task starts, failures, heartbeats)
|
| gauge |
~28 |
Point-in-time readings (e.g., pool slots, running tasks, queue sizes)
|
| timer |
~15 |
Duration measurements in milliseconds (e.g., task duration, scheduling delay)
|
Usage
This YAML file is consumed at startup to initialize the metric registry. It is not imported directly in Python code but is loaded by the observability subsystem to register all known metrics and their metadata. Downstream components such as SafeOtelLogger and StatsD-based loggers use the metric names defined here when emitting telemetry data.
Code Reference
Source Location
- Repository: Apache_Airflow
- File:
shared/observability/src/airflow_shared/observability/metrics/metrics_template.yaml
Structure
The file follows a flat list structure under a top-level metrics key:
---
metrics:
# ==========
# Counters
# ==========
- name: "{job_name}_start"
description: "Number of started ``{job_name}`` job, ex. ``SchedulerJob``, ``LocalTaskJob``"
type: "counter"
legacy_name: "-"
name_variables: ["job_name"]
- name: "operator_failures"
description: "Operator ``{operator_name}`` failures."
type: "counter"
legacy_name: "operator_failures_{operator_name}"
name_variables: ["operator_name"]
# ==========
# Gauges
# ==========
- name: "dagbag_size"
description: "Number of Dags found when the scheduler ran a scan based on its configuration"
type: "gauge"
legacy_name: "-"
name_variables: []
- name: "pool.open_slots"
description: "Number of open slots in the pool."
type: "gauge"
legacy_name: "pool.open_slots.{pool_name}"
name_variables: ["pool_name"]
# ==========
# Timers
# ==========
- name: "task.duration"
description: "Milliseconds taken to run a task"
type: "timer"
legacy_name: "dag.{dag_id}.{task_id}.duration"
name_variables: ["dag_id", "task_id"]
- name: "dagrun.schedule_delay"
description: "Milliseconds of delay between the scheduled DagRun
start date and the actual DagRun start date"
type: "timer"
legacy_name: "dagrun.schedule_delay.{dag_id}"
name_variables: ["dag_id"]
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| metrics |
list[dict] |
Yes |
A YAML list of metric definition objects
|
| metrics[].name |
string |
Yes |
Canonical metric name; may contain {variable} placeholders
|
| metrics[].description |
string |
Yes |
Human-readable description of the metric
|
| metrics[].type |
string |
Yes |
One of: counter, gauge, timer
|
| metrics[].legacy_name |
string |
Yes |
StatsD-era metric name or "-" if none exists
|
| metrics[].name_variables |
list[string] |
Yes |
Tag/label variable names for dimensional metrics
|
Outputs
| Name |
Type |
Description
|
| Metric Registry |
Internal data structure |
Initialized set of metric definitions used by SafeOtelLogger and StatsD loggers
|
| Validation metadata |
Per-metric metadata |
Name, type, and tag information used to validate metric emissions at runtime
|
Metric Categories
Counters (Selected)
| Metric Name |
Tags |
Description
|
{job_name}_start |
job_name |
Number of started jobs
|
operator_failures |
operator_name |
Operator failures
|
ti_failures |
-- |
Overall task instance failures
|
scheduler_heartbeat |
-- |
Scheduler heartbeats
|
dag_processing.processes |
-- |
Relative number of running DAG parsing processes (UpDownCounter)
|
triggers.succeeded |
-- |
Number of triggers that fired at least one event
|
asset.updates |
-- |
Number of updated assets
|
Gauges (Selected)
| Metric Name |
Tags |
Description
|
dagbag_size |
-- |
Number of DAGs found during scheduler scan
|
scheduler.tasks.starving |
-- |
Tasks that cannot be scheduled due to no open pool slots
|
pool.open_slots |
pool_name |
Open slots in a pool
|
triggers.running |
hostname |
Running triggers per triggerer
|
ti.running |
queue, dag_id, task_id |
Running task instances
|
Timers (Selected)
| Metric Name |
Tags |
Description
|
task.duration |
dag_id, task_id |
Milliseconds to run a task
|
dagrun.duration.success |
dag_id |
Milliseconds for DagRun to reach success
|
dagrun.schedule_delay |
dag_id |
Scheduling delay in milliseconds
|
scheduler.scheduler_loop_duration |
-- |
Milliseconds per scheduler loop
|
Usage Examples
Referencing a Metric Name in OTel Logger
# The metric names from the YAML template are used when emitting metrics:
from airflow_shared.observability.metrics.otel_logger import SafeOtelLogger
# Increment a counter defined in the template
logger.incr("operator_failures", tags={"operator_name": "BashOperator"})
# Set a gauge defined in the template
logger.gauge("pool.open_slots", value=5, tags={"pool_name": "default_pool"})
# Record a timer defined in the template
logger.timing("task.duration", dt=1234.5, tags={"dag_id": "my_dag", "task_id": "my_task"})
Loading the YAML Template
import yaml
from pathlib import Path
template_path = Path(__file__).parent / "metrics_template.yaml"
with open(template_path) as f:
metrics_catalog = yaml.safe_load(f)
for metric in metrics_catalog["metrics"]:
print(f"{metric['type']:>8s} | {metric['name']}")
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.