Heuristic:Apache Airflow Memory Management Tips
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Operations, Debugging |
| Last Updated | 2026-02-08 20:00 GMT |
Overview
Prevent scheduler and worker memory bloat using NonCachingFileHandler for logs, GC freeze before forking, and MySQL subquery patterns to avoid sort memory overflow.
Description
Apache Airflow components (scheduler, workers, triggerer) are long-running processes susceptible to gradual memory growth. Three specific patterns address common memory issues: (1) the NonCachingFileHandler prevents Python's file stream cache from growing unboundedly as the scheduler writes new log files, (2) the LocalExecutor freezes the garbage collector before forking worker processes to prevent Copy-on-Write memory overhead, and (3) MySQL-specific queries use subquery patterns to avoid "Out of sort memory" errors caused by large serialized DAG data.
Usage
Apply these heuristics when monitoring shows steadily increasing memory usage on scheduler or worker processes, when OOM kills occur on worker containers, or when MySQL reports sort memory overflow during DAG retrieval queries.
The Insight (Rule of Thumb)
- Action 1 (Log Memory): Use `NonCachingFileHandler` for scheduler log handling. This prevents Python's internal file stream cache from growing as new log files accumulate.
- Action 2 (Fork Memory): Call `gc.freeze()` before forking in LocalExecutor. This moves all existing objects to the permanent generation, preventing Copy-on-Write page duplication.
- Action 3 (MySQL Sort): For MySQL queries on large columns (e.g., serialized DAG data), use subqueries to select IDs first, then join to retrieve data. Avoids sorting large BLOB columns.
- Action 4 (XCom): Be aware that `LazySelectSequence` (mapped task XCom) converting to a list degrades performance. Use iteration instead of list conversion when possible.
- Trade-off: NonCachingFileHandler disables Python's file caching optimization, but the scheduler's log writing pattern makes caching counterproductive. GC freeze adds a small startup overhead per forked process.
Reasoning
Evidence from source code:
From `airflow-core/src/airflow/utils/log/non_caching_file_handler.py:41-49`:
# While there is nothing wrong with such cache (it will be cleaned when memory is needed), it
# causes ever-growing memory usage when scheduler is running as it keeps on writing new log
# files and the files are not rotated later on. This might lead to confusion for our users,
# who are monitoring memory usage of Scheduler - without realising that it is harmless
From `airflow-core/src/airflow/executors/local_executor.py:247-252`:
# This is done to prevent memory increase due to COW (Copy-on-Write) by moving all
# existing objects to the permanent generation before forking the process.
From `airflow-core/src/airflow/models/serialized_dag.py:579`:
# Prevent "Out of sort memory" caused by large values in cls.data column for MySQL.
# Details in https://github.com/apache/airflow/pull/55589
From `airflow-core/src/airflow/models/xcom.py:196-213`:
# Coercing mapped lazy proxy ... to list, which may degrade performance
From `airflow-core/docs/troubleshooting.rst:39-58`:
Process termination by signal — OOM (Out of Memory) is a common cause:
- Best case: task killed with SIGKILL (exit code -9)
- Worst case: entire worker process killed by kernel OOM killer