Implementation:Apache Flink RecordsWithSplitIds
| Knowledge Sources | |
|---|---|
| Domains | Connectors, Source_Framework |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
An interface defining the contract for batches of records passed from split fetcher threads to the source reader, organized by split identifiers.
Description
RecordsWithSplitIds is a key interface in Flink's source connector framework that defines how fetched records are transferred from the I/O fetcher threads to the main source reader thread. It provides an iterator-like API where the consumer first advances to the next split via nextSplit(), then iterates through records within that split via nextRecordFromSplit(). The interface also reports which splits have been fully consumed via finishedSplits().
The interface includes an optional recycle() default method that is called after all records from a batch have been emitted. This gives implementations an opportunity to recycle or reuse the object, which is an important performance optimization when record objects are large or expensive to allocate. The interface is annotated with @PublicEvolving, indicating it is part of Flink's public API but may evolve across minor versions.
Usage
Connector developers encounter this interface as the return type of SplitReader.fetch(). When implementing a SplitReader, the fetch() method must return a RecordsWithSplitIds instance containing the records read from the external system. The standard implementation RecordsBySplits is typically used via its builder. Advanced connector developers may provide custom implementations for performance optimization, for example to support object pooling or zero-copy record transfer via the recycle() method.
Code Reference
Source Location
- Repository: Apache_Flink
- File: flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/source/reader/RecordsWithSplitIds.java
- Lines: 1-61
Signature
@PublicEvolving
public interface RecordsWithSplitIds<E> {
/**
* Moves to the next split. This method is also called initially to move to the first split.
* Returns null, if no splits are left.
*/
@Nullable
String nextSplit();
/**
* Gets the next record from the current split.
* Returns null if no more records are left in this split.
*/
@Nullable
E nextRecordFromSplit();
/**
* Get the finished splits.
*
* @return the finished splits after this RecordsWithSplitIds is returned.
*/
Set<String> finishedSplits();
/**
* Called when all records from this batch have been emitted.
* Gives implementations the opportunity to recycle/reuse this object.
*/
default void recycle() {}
}
Import
import org.apache.flink.connector.base.source.reader.RecordsWithSplitIds;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | - | - | This interface is a data container. It is populated by the SplitReader and consumed by the SourceReaderBase. |
Outputs
| Name | Type | Description |
|---|---|---|
| nextSplit() | String (nullable) | Returns the ID of the next split containing records, or null when all splits have been iterated. |
| nextRecordFromSplit() | E (nullable) | Returns the next record from the current split, or null when no more records remain in the current split. |
| finishedSplits() | Set<String> | Returns the set of split IDs that have been fully consumed and should be reported as finished. |
Usage Examples
// Example: Consuming records from a RecordsWithSplitIds instance
// (This pattern is used internally by SourceReaderBase)
RecordsWithSplitIds<RawRecord> fetchResult = splitReader.fetch();
String splitId;
while ((splitId = fetchResult.nextSplit()) != null) {
RawRecord record;
while ((record = fetchResult.nextRecordFromSplit()) != null) {
// Process each record: transform and emit downstream
recordEmitter.emitRecord(record, output, splitStates.get(splitId));
}
}
// Handle finished splits
Set<String> finished = fetchResult.finishedSplits();
if (!finished.isEmpty()) {
onSplitFinished(finished);
}
// Allow the batch to be recycled
fetchResult.recycle();