Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Airflow Secret Redaction

From Leeroopedia


Knowledge Sources
Domains Security, Logging
Last Updated 2026-02-08 21:00 GMT

Overview

Pattern-based secret detection and masking that replaces sensitive values with *** in logs, stdout, and data structures to prevent accidental credential exposure.

Description

Airflow implements a comprehensive secret redaction system that automatically detects and masks sensitive values before they appear in logs, the web UI, API responses, or any other output channel. The system operates at multiple levels:

  • Field-name detection: Keys and attribute names that match sensitive patterns (e.g., "password", "secret", "token", "api_key", "authorization") trigger automatic masking of their corresponding values.
  • Value-pattern matching: Known credential patterns (e.g., connection URIs containing passwords, bearer tokens) are detected and masked regardless of the field name.
  • Recursive traversal: The masker recursively walks complex data structures -- dicts, lists, tuples, sets, and custom objects -- to find and redact secrets at any depth.
  • Log integration: The SecretsMasker is installed as a logging filter that processes every log record before it reaches any handler, ensuring no sensitive data escapes through logging.

Usage

Secret redaction is always active in Airflow. It applies automatically to:

  • All log output from every Airflow component
  • Connection objects displayed in the UI
  • Variable values in the UI (when marked as sensitive)
  • API responses containing connection or variable data
  • Stdout/stderr capture from task execution

No explicit configuration is needed to enable redaction. Additional sensitive field patterns can be registered programmatically.

Theoretical Basis

Core Mechanism -- Regex-Compiled Pattern Matching:

The redaction system compiles a set of sensitive field-name patterns into a single regex. When processing data:

  1. Compile patterns: Field names known to contain secrets (password, secret, token, api_key, private_key, authorization, etc.) are compiled into a case-insensitive regex pattern.
  2. Recursive traversal: The masker walks the data structure depth-first. For each key-value pair, it tests the key against the compiled pattern.
  3. Value replacement: If the key matches a sensitive pattern, the entire value is replaced with the redaction string (default: ***).
  4. Connection URI handling: For connection strings, only the password portion is redacted, preserving the URI structure for debugging.

Detection Strategies:

Strategy Trigger Example
Field name match Key matches sensitive regex pattern {"password": "s3cret"} becomes {"password": "***"}
Connection URI Password embedded in URI scheme postgres://user:s3cret@host becomes postgres://user:***@host
Registered secrets Values matching known secret values Any occurrence of a known secret value is replaced
Nested structure Recursive traversal of dicts/lists/objects {"config": {"db": {"password": "x"}}} is fully traversed

Performance Considerations:

  • Patterns are pre-compiled to minimize per-record regex overhead.
  • The masker uses short-circuit evaluation: if no sensitive patterns are registered, the record passes through unmodified.
  • Large data structures are traversed lazily where possible to avoid excessive memory allocation during redaction.

Security Boundary:

Redaction is a defense-in-depth measure. It does not replace proper secret management (using secrets backends, encrypted metastore). Its purpose is to prevent accidental exposure through logs and UI output, not to serve as a primary security control.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment