Implementation:Apache Flink Mapred HadoopInputSplit

Knowledge Sources	Apache_Flink
Domains	Connectors, Hadoop_Compatibility
Last Updated	2026-02-09 00:00 GMT

Overview

A wrapper class that adapts a Hadoop mapred InputSplit to Flink's LocatableInputSplit interface, enabling Flink to work with Hadoop-style input splits for data locality and parallelism.

Description

HadoopInputSplit is a class in the org.apache.flink.api.java.hadoop.mapred.wrapper package that extends Flink's LocatableInputSplit. It wraps a Hadoop org.apache.hadoop.mapred.InputSplit so that the Flink runtime can use it as a native Flink InputSplit while preserving the Hadoop split's data locality information.

The constructor accepts a split number, a Hadoop InputSplit, and an optional JobConf. If the Hadoop InputSplit implements Configurable or JobConfigurable, the JobConf is required (a NullPointerException is thrown otherwise). The getHostnames() method delegates to the Hadoop split's getLocations() method, returning an empty array on failure.

Custom serialization is implemented to handle the Hadoop InputSplit correctly:

writeObject: Serializes the parent fields via defaultWriteObject(), optionally writes the JobConf (if the split is configurable), and then writes the Hadoop InputSplit's data via its write() method.

readObject: Deserializes the parent fields, instantiates the InputSplit via WritableFactories.newInstance(), optionally reads and applies the JobConf (configuring the split if it implements Configurable or JobConfigurable), and finally reads the split's data via readFields().

This class is annotated with @Internal, indicating it is not part of the public API.

Usage

This class is used internally by Flink's Hadoop compatibility layer. When a HadoopInputFormatBase creates input splits from a Hadoop mapred InputFormat, each Hadoop InputSplit is wrapped in a HadoopInputSplit instance. Users do not typically instantiate this class directly.

Code Reference

Source Location

Repository: Apache_Flink
File: flink-connectors/flink-hadoop-compatibility/src/main/java/org/apache/flink/api/java/hadoop/mapred/wrapper/HadoopInputSplit.java
Lines: 1-141

Signature

@Internal
public class HadoopInputSplit extends LocatableInputSplit

Import

import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopInputSplit;

I/O Contract

Inputs

Name	Type	Required	Description
splitNumber	int	Yes	The number identifying this split within the set of all splits
hInputSplit	org.apache.hadoop.mapred.InputSplit	Yes	The Hadoop mapred InputSplit to wrap
jobconf	org.apache.hadoop.mapred.JobConf	Conditional	Required if the Hadoop InputSplit implements Configurable or JobConfigurable; nullable otherwise

Outputs

Name	Type	Description
hadoopInputSplit	org.apache.hadoop.mapred.InputSplit	The wrapped Hadoop InputSplit, accessible via getHadoopInputSplit()
hostnames	String[]	Data locality hostnames from the Hadoop InputSplit, accessible via getHostnames()
jobConf	org.apache.hadoop.mapred.JobConf	The associated JobConf, accessible via getJobConf()

Usage Examples

// HadoopInputSplit is typically created internally by HadoopInputFormatBase.
// Manual usage example for illustration:
import org.apache.flink.api.java.hadoop.mapred.wrapper.HadoopInputSplit;
import org.apache.hadoop.mapred.FileSplit;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.fs.Path;

JobConf jobConf = new JobConf();
FileSplit hadoopSplit = new FileSplit(new Path("/data/input"), 0, 1024, new String[]{"host1"});

HadoopInputSplit flinkSplit = new HadoopInputSplit(0, hadoopSplit, jobConf);
String[] hosts = flinkSplit.getHostnames(); // returns ["host1"]
org.apache.hadoop.mapred.InputSplit inner = flinkSplit.getHadoopInputSplit();

Related Pages

Principle:Apache_Flink_Hadoop_Compatibility

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment