Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Flink Locality Aware Split Assignment

From Leeroopedia


Knowledge Sources
Domains Distributed_Computing, Performance_Optimization
Last Updated 2026-02-09 00:00 GMT

Overview

A data-locality-aware scheduling strategy that preferentially assigns file splits to reader tasks co-located with the data blocks on the same physical host.

Description

Locality Aware Split Assignment optimizes data reading performance by minimizing network data transfer. When a reader requests a split, the assigner first checks if any unassigned splits have data blocks hosted on the requesting readers machine. If a local split exists, it is assigned preferentially. If no local split is available, a remote split is assigned using a round-robin strategy to maintain balanced work distribution.

This principle is particularly important for HDFS-based deployments where data locality can significantly reduce network I/O and improve throughput.

Usage

Use this principle when reading from distributed filesystems (HDFS, S3 with locality hints) where minimizing network transfer matters. For local filesystems or cloud storage without locality, locality-aware assignment degrades gracefully to round-robin.

Theoretical Basis

// Abstract algorithm
function assignSplit(requestingHost):
    if requestingHost is null:
        return getRemoteSplit()  // round-robin fallback

    localSplits = findSplitsOnHost(requestingHost)
    if localSplits is not empty:
        return localSplits.getMinLocalCount()  // least-assigned local split

    return getRemoteSplit()  // fallback to remote

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment