Principle:Apache Airflow Secret Redaction
| Knowledge Sources | |
|---|---|
| Domains | Security, Logging |
| Last Updated | 2026-02-08 21:00 GMT |
Overview
Pattern-based secret detection and masking that replaces sensitive values with *** in logs, stdout, and data structures to prevent accidental credential exposure.
Description
Airflow implements a comprehensive secret redaction system that automatically detects and masks sensitive values before they appear in logs, the web UI, API responses, or any other output channel. The system operates at multiple levels:
- Field-name detection: Keys and attribute names that match sensitive patterns (e.g., "password", "secret", "token", "api_key", "authorization") trigger automatic masking of their corresponding values.
- Value-pattern matching: Known credential patterns (e.g., connection URIs containing passwords, bearer tokens) are detected and masked regardless of the field name.
- Recursive traversal: The masker recursively walks complex data structures -- dicts, lists, tuples, sets, and custom objects -- to find and redact secrets at any depth.
- Log integration: The SecretsMasker is installed as a logging filter that processes every log record before it reaches any handler, ensuring no sensitive data escapes through logging.
Usage
Secret redaction is always active in Airflow. It applies automatically to:
- All log output from every Airflow component
- Connection objects displayed in the UI
- Variable values in the UI (when marked as sensitive)
- API responses containing connection or variable data
- Stdout/stderr capture from task execution
No explicit configuration is needed to enable redaction. Additional sensitive field patterns can be registered programmatically.
Theoretical Basis
Core Mechanism -- Regex-Compiled Pattern Matching:
The redaction system compiles a set of sensitive field-name patterns into a single regex. When processing data:
- Compile patterns: Field names known to contain secrets (password, secret, token, api_key, private_key, authorization, etc.) are compiled into a case-insensitive regex pattern.
- Recursive traversal: The masker walks the data structure depth-first. For each key-value pair, it tests the key against the compiled pattern.
- Value replacement: If the key matches a sensitive pattern, the entire value is replaced with the redaction string (default:
***). - Connection URI handling: For connection strings, only the password portion is redacted, preserving the URI structure for debugging.
Detection Strategies:
| Strategy | Trigger | Example |
|---|---|---|
| Field name match | Key matches sensitive regex pattern | {"password": "s3cret"} becomes {"password": "***"}
|
| Connection URI | Password embedded in URI scheme | postgres://user:s3cret@host becomes postgres://user:***@host
|
| Registered secrets | Values matching known secret values | Any occurrence of a known secret value is replaced |
| Nested structure | Recursive traversal of dicts/lists/objects | {"config": {"db": {"password": "x"}}} is fully traversed
|
Performance Considerations:
- Patterns are pre-compiled to minimize per-record regex overhead.
- The masker uses short-circuit evaluation: if no sensitive patterns are registered, the record passes through unmodified.
- Large data structures are traversed lazily where possible to avoid excessive memory allocation during redaction.
Security Boundary:
Redaction is a defense-in-depth measure. It does not replace proper secret management (using secrets backends, encrypted metastore). Its purpose is to prevent accidental exposure through logs and UI output, not to serve as a primary security control.