Principle:Apache Flink Source Connector Framework
| Knowledge Sources | |
|---|---|
| Domains | Source_Connector, Framework |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Description
The FLIP-27 Source Connector Framework defines the core abstractions used to build source connectors in Apache Flink. It is part of the flink-connector-base module and provides a standardized contract for reading data from external systems by decomposing the reading process into split management and split reading. The framework revolves around the SplitReader interface, which reads elements from one or more assigned splits, and the SplitsChange type hierarchy, which communicates split lifecycle events (additions and removals) to the reader.
The key abstractions in this framework are:
- SplitReader -- A
@PublicEvolvinginterface parameterized by element typeEand split typeSplitT. It exposes three core operations:fetch()to retrieve records asRecordsWithSplitIds,handleSplitsChanges()to react to split reassignment, andwakeUp()to interrupt a blocking fetch. It also provides an optionalpauseOrResumeSplits()method for watermark alignment. - SplitsChange -- An abstract base class that wraps an immutable list of splits, serving as the polymorphic root for all split change events.
- SplitsAddition -- A concrete
SplitsChangesubclass (@PublicEvolving) signaling that new splits have been assigned to the reader. - SplitsRemoval -- A concrete
SplitsChangesubclass (@Internal) signaling that splits should be removed, typically triggered when aRecordEvaluatorreports end-of-stream.
Theoretical Basis
The Source Connector Framework embodies the Split Enumerator / Split Reader pattern introduced in FLIP-27. This design separates concerns into two roles: a centralized SplitEnumerator that discovers and assigns work units (splits) and a per-subtask SplitReader that fetches records from those splits. The SplitsChange hierarchy uses the Command pattern to communicate split lifecycle events in a type-safe, extensible manner. SplitsAddition and SplitsRemoval are discrete command objects that the reader interprets, allowing new change types to be added without modifying the SplitReader interface.
The fetch() method follows a pull-based model where the reader thread calls fetch in a loop, and the method may block until data is available or wakeUp() is called. This blocking-with-interrupt pattern is analogous to the poll/wakeup pattern used in Kafka consumers. The framework also supports watermark alignment through the optional pauseOrResumeSplits() method, which enables coordinated progress tracking across multiple splits read by a single reader.
| Abstraction | Type | Visibility | Purpose |
|---|---|---|---|
SplitReader<E, SplitT> |
Interface | @PublicEvolving |
Read records from assigned splits |
SplitsChange<SplitT> |
Abstract Class | @PublicEvolving |
Base type for split lifecycle events |
SplitsAddition<SplitT> |
Class | @PublicEvolving |
Signals new splits assigned to reader |
SplitsRemoval<SplitT> |
Class | @Internal |
Signals splits removed from reader |