Principle:Apache Flink Source Connector Framework

Knowledge Sources	Apache_Flink
Domains	Source_Connector, Framework
Last Updated	2026-02-09 00:00 GMT

Overview

Description

The FLIP-27 Source Connector Framework defines the core abstractions used to build source connectors in Apache Flink. It is part of the flink-connector-base module and provides a standardized contract for reading data from external systems by decomposing the reading process into split management and split reading. The framework revolves around the SplitReader interface, which reads elements from one or more assigned splits, and the SplitsChange type hierarchy, which communicates split lifecycle events (additions and removals) to the reader.

The key abstractions in this framework are:

SplitReader -- A @PublicEvolving interface parameterized by element type E and split type SplitT. It exposes three core operations: fetch() to retrieve records as RecordsWithSplitIds, handleSplitsChanges() to react to split reassignment, and wakeUp() to interrupt a blocking fetch. It also provides an optional pauseOrResumeSplits() method for watermark alignment.
SplitsChange -- An abstract base class that wraps an immutable list of splits, serving as the polymorphic root for all split change events.
SplitsAddition -- A concrete SplitsChange subclass (@PublicEvolving) signaling that new splits have been assigned to the reader.
SplitsRemoval -- A concrete SplitsChange subclass (@Internal) signaling that splits should be removed, typically triggered when a RecordEvaluator reports end-of-stream.

Theoretical Basis

The Source Connector Framework embodies the Split Enumerator / Split Reader pattern introduced in FLIP-27. This design separates concerns into two roles: a centralized SplitEnumerator that discovers and assigns work units (splits) and a per-subtask SplitReader that fetches records from those splits. The SplitsChange hierarchy uses the Command pattern to communicate split lifecycle events in a type-safe, extensible manner. SplitsAddition and SplitsRemoval are discrete command objects that the reader interprets, allowing new change types to be added without modifying the SplitReader interface.

The fetch() method follows a pull-based model where the reader thread calls fetch in a loop, and the method may block until data is available or wakeUp() is called. This blocking-with-interrupt pattern is analogous to the poll/wakeup pattern used in Kafka consumers. The framework also supports watermark alignment through the optional pauseOrResumeSplits() method, which enables coordinated progress tracking across multiple splits read by a single reader.

Abstraction	Type	Visibility	Purpose
`SplitReader<E, SplitT>`	Interface	`@PublicEvolving`	Read records from assigned splits
`SplitsChange<SplitT>`	Abstract Class	`@PublicEvolving`	Base type for split lifecycle events
`SplitsAddition<SplitT>`	Class	`@PublicEvolving`	Signals new splits assigned to reader
`SplitsRemoval<SplitT>`	Class	`@Internal`	Signals splits removed from reader

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment