Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Apache Airflow Provider Distribution Development

From Leeroopedia


Knowledge Sources
Domains Software_Engineering, Plugin_Development, Open_Source
Last Updated 2026-02-08 19:00 GMT

Overview

End-to-end process for developing, testing, and releasing Apache Airflow provider distribution packages that extend Airflow with integrations for external services.

Description

This workflow covers the lifecycle of creating and maintaining Airflow provider packages. Providers are separately versioned and released distribution packages that contain operators, hooks, sensors, transfers, and other extensions for integrating Airflow with external services (cloud platforms, databases, messaging systems, etc.). The workflow covers provider structure, metadata configuration via provider.yaml, hook and operator implementation, testing conventions, and the release process. The Airflow monorepo hosts 100+ community-managed providers under the providers/ directory.

Usage

Execute this workflow when you need to create a new Airflow provider package for integrating with an external service, or when maintaining an existing community provider. This is relevant for contributors adding new cloud integrations, database connectors, or custom operators that should be distributed as independent packages with their own versioning lifecycle.

Execution Steps

Step 1: Provider Package Scaffolding

Create the provider package structure within the Airflow monorepo under the providers/ directory. Define the provider.yaml metadata file which declares the package name, description, version, dependencies, supported Airflow versions, and the operators/hooks/sensors it exposes. The provider.yaml serves as the single source of truth for provider metadata and is used by the build system and documentation generator.

Key considerations:

  • Each provider follows the naming convention apache-airflow-providers-{name}
  • Provider versions follow SemVer independently of Airflow core versions
  • The provider.yaml declares all entry points: operators, hooks, sensors, transfers, and connections
  • Providers can depend on other providers and on specific Airflow core versions

Step 2: Hook and Connection Implementation

Implement the hook class that provides the core API client functionality for the external service. Hooks manage authentication, connection configuration, and API interactions. Register connection types in the provider.yaml so the Airflow UI can display custom connection forms with appropriate fields for the service.

Key considerations:

  • Hooks should inherit from appropriate base classes (BaseHook, DbApiHook, etc.)
  • Connection types are discovered through the providers_discovery system using entry points
  • Provider connections are stored in the Airflow metadata database or retrieved from secrets backends
  • Connection testing capabilities should be implemented for UI-based validation

Step 3: Operator and Sensor Development

Implement operators (task execution logic) and sensors (condition monitoring) that leverage the hooks for interacting with the external service. Operators execute actions (transfer data, trigger jobs, create resources), while sensors poll for conditions (file arrival, job completion, state changes). For long-running operations, implement deferrable versions that use triggers to free executor slots.

Key considerations:

  • Operators should be idempotent where possible
  • Deferrable operators yield to the triggerer and resume when async conditions are met
  • Sensors support both poke mode (periodic polling) and reschedule mode (releasing worker slots between checks)
  • Follow the naming conventions: {Service}Operator, {Service}Sensor, {Service}Hook

Step 4: Testing and Quality Assurance

Write comprehensive tests for all provider components. Unit tests validate individual hook, operator, and sensor functionality. Integration tests verify end-to-end behavior against real or mocked services. System tests demonstrate complete DAG execution using the provider components. Ensure compliance with project structure checks and pre-commit hooks.

Key considerations:

  • Tests should follow the three-tier structure: unit, integration, system
  • Mock external service calls in unit tests for fast, reliable execution
  • System test DAGs serve as both tests and usage examples
  • Pre-commit hooks enforce code quality, import patterns, and provider isolation constraints

Step 5: Documentation and Examples

Write documentation for the provider including connection setup guides, operator/hook/sensor API reference, and example DAGs demonstrating common use cases. Documentation is built with Sphinx and published to the Airflow documentation site. Ensure all operators and hooks have complete docstrings.

Key considerations:

  • Provider documentation builds are separate from core documentation
  • Example DAGs serve as both documentation and system tests
  • Connection documentation should include screenshots of custom connection forms
  • Redirect files maintain backward compatibility when documentation URLs change

Step 6: Provider Release Process

Release the provider package following the ASF release process. This involves bumping the version in provider.yaml, generating changelog entries, building the distribution package, signing artifacts, uploading to SVN for Apache voting, and ultimately publishing to PyPI. Provider releases can happen independently of Airflow core releases.

Key considerations:

  • Over 100 provider packages may be released simultaneously
  • Each provider maintains its own changelog and version history
  • The release process includes cryptographic signing and community voting
  • Providers follow SemVer: major bumps for breaking changes, minor for features, patch for fixes
  • The .last_release_date.txt tracks the most recent provider release cycle

Execution Diagram

GitHub URL

Workflow Repository