Principle:Apache Flink Source Chain Composition

Knowledge Sources	Apache Flink HybridSource Apache Flink
Domains	Stream_Processing, Source_Architecture
Last Updated	2026-02-09 00:00 GMT

Overview

A composition pattern that chains multiple data sources into a sequential pipeline, enabling seamless transitions from bounded historical data to unbounded real-time streams.

Description

Source Chain Composition addresses the common requirement of bootstrapping a streaming pipeline from historical data before switching to live data. Rather than requiring users to manually orchestrate source switching, this principle composes multiple sources into a single logical source that reads from each in sequence.

Key design decisions:

Sequential execution: Sources are read in order, not in parallel
Type unification: All sources must produce the same output type T
Boundedness determination: The chains boundedness is determined by the last source
Position handoff: An optional SourceFactory allows later sources to receive context from the previous sources enumerator for seamless position-based transitions

Usage

Use this principle when a pipeline needs to read historical data (e.g., from files or a data lake) before switching to real-time data (e.g., from Kafka). The source chain ensures no data gaps between sources.

Theoretical Basis

// Abstract source chain composition
function buildChain(sources):
    chain = []
    for each source in sources:
        if source has factory:
            chain.add(SourceEntry(source.factory, source.boundedness))
        else:
            chain.add(SourceEntry(wrapAsFactory(source), source.boundedness))
    return HybridSource(chain)
    // Overall boundedness = chain.last().boundedness

Related Pages

Implemented By

Implementation:Apache_Flink_HybridSource_Builder

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment