Principle:Mage ai Mage ai Parallel Sink Draining
| Knowledge Sources | |
|---|---|
| Domains | Data_Integration, Parallel_Processing, Performance |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A parallel batch processing mechanism that drains multiple Singer target sinks concurrently using thread-based parallelism for improved throughput.
Description
Parallel Sink Draining enables destination connectors using the Singer SDK Target architecture to process multiple stream sinks concurrently. When a drain operation is triggered (batch full, end of pipe, or max record age exceeded), all active sinks are drained in parallel using Python's joblib with threading backend. Cleared sinks (from schema changes) are drained sequentially first, then active sinks are drained in parallel up to max_parallelism (default 8 threads). After draining, state is written and cleanup runs on end-of-pipe.
Usage
Use this principle when building destination connectors that need to handle multiple streams concurrently. Applicable to the Singer SDK Target architecture (not the Mage-native Destination architecture which handles batching differently).
Theoretical Basis
The parallel drain algorithm:
- Deep copy current state
- Drain cleared sinks sequentially (schema changes require ordered processing)
- If end-of-pipe: clean up cleared sinks
- Drain active sinks in parallel (up to max_parallelism threads via joblib)
- If end-of-pipe: clean up active sinks
- Write state message
- Reset max record age timer
Each sink drain: start_drain() -> process_batch(context) -> mark_drained()