Implementation:Apache Flink LocalityAwareSplitAssigner GetNext
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, Performance_Optimization |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for assigning file splits to readers with preference for data-local assignments provided by the Apache Flink connector-files module.
Description
The LocalityAwareSplitAssigner implements FileSplitAssigner and maintains a pool of unassigned splits. When getNext is called with a hostname, it checks if any splits have data blocks on that host. Local splits are selected using a least-local-count strategy (preferring splits with fewer local assignments) to avoid starvation. If no local split is available, a remote split is chosen via round-robin. The assigner tracks local vs. remote assignment metrics.
Usage
This is the default split assigner for FileSource. No user configuration is needed.
Code Reference
Source Location
- Repository: Apache Flink
- File: flink-connectors/flink-connector-files/src/main/java/org/apache/flink/connector/file/src/assigners/LocalityAwareSplitAssigner.java
- Lines: L54-362
Signature
@PublicEvolving
public class LocalityAwareSplitAssigner implements FileSplitAssigner {
public LocalityAwareSplitAssigner(Collection<FileSourceSplit> splits);
@Override
public Optional<FileSourceSplit> getNext(@Nullable String host);
@Override
public void addSplits(Collection<FileSourceSplit> splits);
}
Import
import org.apache.flink.connector.file.src.assigners.LocalityAwareSplitAssigner;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| host | String (nullable) | No | Requesting reader's hostname for locality matching |
Outputs
| Name | Type | Description |
|---|---|---|
| split | Optional<FileSourceSplit> | Next split to process (local-preferred) or empty if none available |
Usage Examples
Split Assignment Flow
// The enumerator calls the assigner when a reader requests work:
// 1. Reader on host "node1" requests a split
// 2. Assigner checks for splits with blocks on "node1"
// 3. If found -> assigns local split (increments localAssignments counter)
// 4. If not found -> assigns any available split (increments remoteAssignments counter)