Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Flink Source Connector Framework

From Leeroopedia


Knowledge Sources
Domains Source_Connector, Framework
Last Updated 2026-02-09 00:00 GMT

Overview

Description

The FLIP-27 Source Connector Framework defines the core abstractions used to build source connectors in Apache Flink. It is part of the flink-connector-base module and provides a standardized contract for reading data from external systems by decomposing the reading process into split management and split reading. The framework revolves around the SplitReader interface, which reads elements from one or more assigned splits, and the SplitsChange type hierarchy, which communicates split lifecycle events (additions and removals) to the reader.

The key abstractions in this framework are:

  • SplitReader -- A @PublicEvolving interface parameterized by element type E and split type SplitT. It exposes three core operations: fetch() to retrieve records as RecordsWithSplitIds, handleSplitsChanges() to react to split reassignment, and wakeUp() to interrupt a blocking fetch. It also provides an optional pauseOrResumeSplits() method for watermark alignment.
  • SplitsChange -- An abstract base class that wraps an immutable list of splits, serving as the polymorphic root for all split change events.
  • SplitsAddition -- A concrete SplitsChange subclass (@PublicEvolving) signaling that new splits have been assigned to the reader.
  • SplitsRemoval -- A concrete SplitsChange subclass (@Internal) signaling that splits should be removed, typically triggered when a RecordEvaluator reports end-of-stream.

Theoretical Basis

The Source Connector Framework embodies the Split Enumerator / Split Reader pattern introduced in FLIP-27. This design separates concerns into two roles: a centralized SplitEnumerator that discovers and assigns work units (splits) and a per-subtask SplitReader that fetches records from those splits. The SplitsChange hierarchy uses the Command pattern to communicate split lifecycle events in a type-safe, extensible manner. SplitsAddition and SplitsRemoval are discrete command objects that the reader interprets, allowing new change types to be added without modifying the SplitReader interface.

The fetch() method follows a pull-based model where the reader thread calls fetch in a loop, and the method may block until data is available or wakeUp() is called. This blocking-with-interrupt pattern is analogous to the poll/wakeup pattern used in Kafka consumers. The framework also supports watermark alignment through the optional pauseOrResumeSplits() method, which enables coordinated progress tracking across multiple splits read by a single reader.

Abstraction Type Visibility Purpose
SplitReader<E, SplitT> Interface @PublicEvolving Read records from assigned splits
SplitsChange<SplitT> Abstract Class @PublicEvolving Base type for split lifecycle events
SplitsAddition<SplitT> Class @PublicEvolving Signals new splits assigned to reader
SplitsRemoval<SplitT> Class @Internal Signals splits removed from reader

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment