Workflow:Apache Airflow Provider Distribution Development
| Knowledge Sources | |
|---|---|
| Domains | Software_Engineering, Plugin_Development, Open_Source |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
End-to-end process for developing, testing, and releasing Apache Airflow provider distribution packages that extend Airflow with integrations for external services.
Description
This workflow covers the lifecycle of creating and maintaining Airflow provider packages. Providers are separately versioned and released distribution packages that contain operators, hooks, sensors, transfers, and other extensions for integrating Airflow with external services (cloud platforms, databases, messaging systems, etc.). The workflow covers provider structure, metadata configuration via provider.yaml, hook and operator implementation, testing conventions, and the release process. The Airflow monorepo hosts 100+ community-managed providers under the providers/ directory.
Usage
Execute this workflow when you need to create a new Airflow provider package for integrating with an external service, or when maintaining an existing community provider. This is relevant for contributors adding new cloud integrations, database connectors, or custom operators that should be distributed as independent packages with their own versioning lifecycle.
Execution Steps
Step 1: Provider Package Scaffolding
Create the provider package structure within the Airflow monorepo under the providers/ directory. Define the provider.yaml metadata file which declares the package name, description, version, dependencies, supported Airflow versions, and the operators/hooks/sensors it exposes. The provider.yaml serves as the single source of truth for provider metadata and is used by the build system and documentation generator.
Key considerations:
- Each provider follows the naming convention apache-airflow-providers-{name}
- Provider versions follow SemVer independently of Airflow core versions
- The provider.yaml declares all entry points: operators, hooks, sensors, transfers, and connections
- Providers can depend on other providers and on specific Airflow core versions
Step 2: Hook and Connection Implementation
Implement the hook class that provides the core API client functionality for the external service. Hooks manage authentication, connection configuration, and API interactions. Register connection types in the provider.yaml so the Airflow UI can display custom connection forms with appropriate fields for the service.
Key considerations:
- Hooks should inherit from appropriate base classes (BaseHook, DbApiHook, etc.)
- Connection types are discovered through the providers_discovery system using entry points
- Provider connections are stored in the Airflow metadata database or retrieved from secrets backends
- Connection testing capabilities should be implemented for UI-based validation
Step 3: Operator and Sensor Development
Implement operators (task execution logic) and sensors (condition monitoring) that leverage the hooks for interacting with the external service. Operators execute actions (transfer data, trigger jobs, create resources), while sensors poll for conditions (file arrival, job completion, state changes). For long-running operations, implement deferrable versions that use triggers to free executor slots.
Key considerations:
- Operators should be idempotent where possible
- Deferrable operators yield to the triggerer and resume when async conditions are met
- Sensors support both poke mode (periodic polling) and reschedule mode (releasing worker slots between checks)
- Follow the naming conventions: {Service}Operator, {Service}Sensor, {Service}Hook
Step 4: Testing and Quality Assurance
Write comprehensive tests for all provider components. Unit tests validate individual hook, operator, and sensor functionality. Integration tests verify end-to-end behavior against real or mocked services. System tests demonstrate complete DAG execution using the provider components. Ensure compliance with project structure checks and pre-commit hooks.
Key considerations:
- Tests should follow the three-tier structure: unit, integration, system
- Mock external service calls in unit tests for fast, reliable execution
- System test DAGs serve as both tests and usage examples
- Pre-commit hooks enforce code quality, import patterns, and provider isolation constraints
Step 5: Documentation and Examples
Write documentation for the provider including connection setup guides, operator/hook/sensor API reference, and example DAGs demonstrating common use cases. Documentation is built with Sphinx and published to the Airflow documentation site. Ensure all operators and hooks have complete docstrings.
Key considerations:
- Provider documentation builds are separate from core documentation
- Example DAGs serve as both documentation and system tests
- Connection documentation should include screenshots of custom connection forms
- Redirect files maintain backward compatibility when documentation URLs change
Step 6: Provider Release Process
Release the provider package following the ASF release process. This involves bumping the version in provider.yaml, generating changelog entries, building the distribution package, signing artifacts, uploading to SVN for Apache voting, and ultimately publishing to PyPI. Provider releases can happen independently of Airflow core releases.
Key considerations:
- Over 100 provider packages may be released simultaneously
- Each provider maintains its own changelog and version history
- The release process includes cryptographic signing and community voting
- Providers follow SemVer: major bumps for breaking changes, minor for features, patch for fixes
- The .last_release_date.txt tracks the most recent provider release cycle