Principle:Apache Airflow Task Communication
| Knowledge Sources | |
|---|---|
| Domains | Workflow_Orchestration, Data_Engineering |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
A mechanism for passing data between tasks in an Airflow DAG via a shared metadata store.
Description
Task Communication in Airflow is achieved through XCom (Cross-Communication), a system for exchanging small amounts of data between tasks. XCom values are stored in the metadata database as JSON and are keyed by a combination of dag_id, task_id, run_id, map_index, and a user-defined key. The TaskFlow API automatically pushes return values as XCom and pulls them as function arguments, making inter-task data flow implicit and Pythonic.
Usage
Use XCom when tasks need to share small amounts of metadata, configuration, or results. For large datasets, use external storage (S3, GCS) and pass only references via XCom. XCom is essential for dynamic workflows where downstream task behavior depends on upstream results.
Theoretical Basis
Message Passing Pattern:
- Producer: Task pushes a value with a key (explicit or via return value)
- Store: Metadata database persists the value as JSON
- Consumer: Downstream task pulls the value by key
Automatic XCom (TaskFlow):
# Pseudo-code: TaskFlow automatic XCom
result = upstream_task() # Return value auto-pushed as XCom
downstream_task(data=result) # XCom auto-pulled as argument
Key constraints:
- Values must be JSON-serializable
- Default size limit depends on database backend (PostgreSQL JSONB is most flexible)
- Custom XCom backends can override storage mechanism