Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Astronomer Astronomer cosmos Telemetry And Observability

From Leeroopedia


Knowledge Sources
Domains Telemetry, Observability
Last Updated 2026-02-07 17:00 GMT

Overview

Collecting anonymous usage metrics and runtime diagnostics to guide project development decisions and enable operators to debug performance issues in production.

Description

What it is: Telemetry and Observability is a dual-purpose principle combining (a) anonymous, opt-out usage telemetry that reports aggregate adoption metrics to the Cosmos maintainers, and (b) a debug-mode memory tracking facility that records per-task resource consumption and surfaces it through Airflow's XCom mechanism. Together, these subsystems answer two distinct audiences: project maintainers who need to understand how Cosmos is used in the wild, and platform operators who need to diagnose memory pressure, slow tasks, and resource contention in their Airflow deployments.

What problem it solves: Open-source projects suffer from an observability gap: maintainers cannot see how their software is deployed, which execution modes are popular, which database backends dominate, or how often DAG runs succeed versus fail. Without this data, roadmap prioritisation is guesswork. Separately, Airflow operators (the human kind) frequently encounter memory-related failures in dbt workloads -- especially when running dbt in-process or in virtualenvs -- but Airflow provides no built-in mechanism to track peak memory consumption of a task and its child processes. The debug-mode MemoryTracker fills this gap.

Where it fits: Telemetry hooks into Airflow's listener framework at two levels: DAG run listeners fire on DAG success or failure and emit aggregate DAG-level metrics, while task instance listeners fire on individual task success or failure and emit granular task-level metrics (operator class, execution mode, invocation mode, database type, dbt command, duration). The memory tracker is orthogonal to telemetry; it is wired into Cosmos operators as execution and completion callbacks when debug mode is enabled. All telemetry emission is gated by three independent opt-out flags (enable_telemetry, DO_NOT_TRACK, SCARF_NO_ANALYTICS).

Usage

Telemetry (for project maintainers):

  • Telemetry is enabled by default. No action is required from users.
  • Users who wish to opt out can set the Airflow config [cosmos] enable_telemetry = False, or the environment variables DO_NOT_TRACK=1 or SCARF_NO_ANALYTICS=1.
  • Metrics are sent as query-string parameters in an HTTPS GET request to Astronomer's Scarf gateway. No personally identifiable information, project code, SQL, or credentials are transmitted.

Debug mode (for platform operators):

  • Enable via [cosmos] enable_debug_mode = True in airflow.cfg.
  • Optionally tune the sampling interval with [cosmos] debug_memory_poll_interval_seconds (default: 0.5 seconds).
  • After task completion, inspect the XCom key cosmos_debug_max_memory_mb on the task instance to see peak RSS memory (in megabytes) including all child processes.

Theoretical Basis

The principle rests on four mechanisms:

1. Airflow hookimpl listeners. Cosmos registers two listener modules with Airflow's plugin/listener system using the @hookimpl decorator. The dag_run_listener implements on_dag_run_success and on_dag_run_failed, which are called by Airflow after every DAG run completes. The listener first checks whether the DAG contains any Cosmos tasks (by inspecting each task's _task_module for a cosmos. prefix). If so, it collects a telemetry payload including the DAG hash, task counts, Cosmos task counts, execution modes, and any compressed metadata embedded at DAG parse time. The task_instance_listener implements on_task_instance_success and on_task_instance_failed, collecting per-task metrics: operator class name, execution mode (derived from module path), invocation mode, profile strategy, profile mapping class, target database, dbt command, duration, and whether the task is mapped or uses callbacks.

2. Anonymous metric emission. The telemetry module collects standard metrics (Cosmos version, Airflow version, Python version, platform) and merges them with event-specific metrics. The combined payload is URL-encoded and sent as an HTTP GET to a versioned Scarf gateway endpoint (TELEMETRY_URL at version v3). The request uses httpx with a short timeout, and failures are logged at warning level but never raise exceptions -- telemetry must never interfere with DAG execution.

3. Compressed telemetry metadata in DAG params. At DAG construction time, the Cosmos converter can embed additional metadata (such as rendering mode, node counts, and configuration flags) into dag.params["__cosmos_telemetry_metadata__"]. Because DAG params are serialised into the metadata database, the metadata is compressed using zlib level 9 and base64-encoded to minimise storage overhead. The dag_run_listener decompresses this metadata at event time and includes it in the telemetry payload.

4. Sampling-based memory tracking (MemoryTracker). When debug mode is enabled, Cosmos attaches start_memory_tracking as an on_execute_callback and stop_memory_tracking as both on_success_callback and on_failure_callback on each operator. The start callback spawns a daemon thread running a MemoryTracker that repeatedly (at the configured poll interval) reads the RSS of the current process and all its children via psutil, maintaining a running maximum. On task completion, the stop callback terminates the thread, converts the peak RSS from bytes to megabytes, and pushes the value to XCom under the key cosmos_debug_max_memory_mb. Trackers are stored in a global dictionary keyed by {dag_id}.{task_id}.{run_id} to handle concurrent task instances safely.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment