Principle:Apache Flink Source Chain Composition
| Knowledge Sources | |
|---|---|
| Domains | Stream_Processing, Source_Architecture |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
A composition pattern that chains multiple data sources into a sequential pipeline, enabling seamless transitions from bounded historical data to unbounded real-time streams.
Description
Source Chain Composition addresses the common requirement of bootstrapping a streaming pipeline from historical data before switching to live data. Rather than requiring users to manually orchestrate source switching, this principle composes multiple sources into a single logical source that reads from each in sequence.
Key design decisions:
- Sequential execution: Sources are read in order, not in parallel
- Type unification: All sources must produce the same output type T
- Boundedness determination: The chains boundedness is determined by the last source
- Position handoff: An optional SourceFactory allows later sources to receive context from the previous sources enumerator for seamless position-based transitions
Usage
Use this principle when a pipeline needs to read historical data (e.g., from files or a data lake) before switching to real-time data (e.g., from Kafka). The source chain ensures no data gaps between sources.
Theoretical Basis
// Abstract source chain composition
function buildChain(sources):
chain = []
for each source in sources:
if source has factory:
chain.add(SourceEntry(source.factory, source.boundedness))
else:
chain.add(SourceEntry(wrapAsFactory(source), source.boundedness))
return HybridSource(chain)
// Overall boundedness = chain.last().boundedness