Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Datahub project Datahub Secret Handling And Deprecation Patterns

From Leeroopedia



Knowledge Sources
Domains Security, Configuration, Backwards_Compatibility
Last Updated 2026-02-10 00:00 GMT

Overview

Patterns for handling secrets in configuration (using SecretStr and masking), and for managing field deprecations/renames using pydantic helper functions to maintain backwards compatibility.

Description

DataHub's ingestion framework has established patterns for two critical concerns: (1) protecting sensitive values in configuration and logs using pydantic's `SecretStr` type and a circuit-breaker-based masking filter, and (2) evolving configuration schemas over time using `pydantic_field_deprecated()` and `pydantic_renamed_field()` helpers that emit warnings while maintaining backwards compatibility. These patterns are essential for production safety and smooth upgrades.

Usage

Use this heuristic when adding new configuration fields that contain passwords, tokens, or API keys, or when renaming or deprecating existing configuration fields. Also relevant when working with the logging/masking subsystem.

The Insight (Rule of Thumb)

  • For secrets:
    • Use `SecretStr` type for all password/token/API key fields in pydantic configs
    • Never log secret values; rely on the masking filter
    • The masking filter uses a circuit breaker: if masking fails repeatedly, ALL messages are redacted (fail-safe)
    • Known limitation: `\n` in secrets is lossy — the code cannot distinguish literal `\n` from newline characters
  • For field deprecation:
    • Use `pydantic_field_deprecated()` helper to deprecate a field
    • Use `pydantic_renamed_field()` helper to rename a field
    • Deprecated fields maintain backward compatibility for ~4-6 weeks (two server releases)
    • Validators must throw only `ValueError`, `TypeError`, or `AssertionError` (NOT `ConfigurationError`)
  • For field renames:
    • Old name continues to work with a warning
    • Cannot provide both old and new name simultaneously (raises error)
    • Reference the PR and context for the rename in the deprecation message
  • Trade-off: More boilerplate per field change, but prevents breaking existing user recipes during upgrades.

Reasoning

Secrets: The circuit breaker pattern in the masking filter ensures that even if the masking regex encounters an unexpected input (e.g., a secret containing regex metacharacters), the system fails safely by redacting everything rather than leaking secrets. `re.escape()` is used to handle secrets with special regex characters.

Deprecation: DataHub recipes are YAML configurations that users maintain outside the repository. Abruptly removing or renaming fields breaks all existing recipes. The deprecation helpers provide a warning period where both old and new names work, giving users time to update their configurations. This is especially important because the CLI releases much more frequently than the server (every few days vs. twice monthly).

Code Evidence

Secret newline handling from `secret_common.py:22-31`:

# HACK: Secret value newline character handling is lossy.
# Previously, r'\n' strings were incorrectly replaced with newlines.
# This cannot be properly distinguished anymore, so the code assumes
# all r'\n' strings are newlines.

Field deprecation helper usage pattern:

# From validate_field_deprecation.py
# pydantic_field_deprecated() emits warnings to both:
#   - Python warnings module
#   - Global warning system
# Prevents field from being used without warning

Field rename helper pattern:

# From validate_field_rename.py
# pydantic_renamed_field() handles:
#   - Transformation between old and new field names
#   - Prevents using both old and new names simultaneously
#   - Validates renamed field doesn't conflict with other config

Masking circuit breaker from `masking_filter.py`:

# If masking fails repeatedly, circuit breaker opens:
# ALL messages redacted (safety fallback)
# Multiple failure modes tracked:
#   regex errors, replacement errors, memory errors, general errors

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment