Implementation:Heibaiying BigData Notes CustomPartitioner GetPartition

Knowledge Sources	BigData-Notes Hadoop API
Domains	Distributed_Computing, Big_Data
Last Updated	2026-02-10 10:00 GMT

Overview

Concrete tool for routing intermediate key-value pairs to specific reducers based on word index provided by the BigData-Notes repository.

Description

The CustomPartitioner class extends Hadoop's Partitioner<Text, IntWritable> and implements a deterministic partitioning scheme for the word count pipeline. Instead of using the default hash-based partitioning, this custom partitioner assigns each word to a reducer based on the word's index position in the predefined WordCountDataUtils.WORD_LIST.

The getPartition() method looks up the input word in the static vocabulary list and returns its index as the partition number. This guarantees that each word is always routed to the same reducer, and when the number of reduce tasks equals the vocabulary size (6), each reducer processes exactly one word.

Usage

Use this custom partitioner when you need deterministic key-to-reducer assignment in the word count pipeline. It must be registered with the job via job.setPartitionerClass(CustomPartitioner.class) and the number of reduce tasks must be set to match the vocabulary size (6).

Code Reference

Source Location

Repository: BigData-Notes
File: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/CustomPartitioner.java
Lines: L11-16

Signature

public class CustomPartitioner extends Partitioner<Text, IntWritable> {

    @Override
    public int getPartition(Text text, IntWritable intWritable, int numPartitions)
}

Import

import com.heibaiying.component.CustomPartitioner;

I/O Contract

Inputs

Name	Type	Required	Description
text	Text	Yes	The intermediate key (a word) emitted by the mapper
intWritable	IntWritable	Yes	The intermediate value (the count) emitted by the mapper
numPartitions	int	Yes	The total number of reduce tasks (should equal the vocabulary size)

Outputs

Name	Type	Description
partition	int	The index of the word in WordCountDataUtils.WORD_LIST, determining which reducer receives this key

Usage Examples

Basic Usage

import com.heibaiying.component.CustomPartitioner;
import org.apache.hadoop.mapreduce.Job;

Job job = Job.getInstance(conf, "WordCountWithPartitioner");
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);

// Register the custom partitioner
job.setPartitionerClass(CustomPartitioner.class);

// Set the number of reduce tasks to match the vocabulary size
job.setNumReduceTasks(WordCountDataUtils.WORD_LIST.size());

Partition Assignment

// Given WORD_LIST = ["Spark", "Hadoop", "HBase", "Storm", "Flink", "Hive"]
//
// getPartition("Spark", 1, 6)  -> returns 0  (reducer 0)
// getPartition("Hadoop", 1, 6) -> returns 1  (reducer 1)
// getPartition("HBase", 1, 6)  -> returns 2  (reducer 2)
// getPartition("Storm", 1, 6)  -> returns 3  (reducer 3)
// getPartition("Flink", 1, 6)  -> returns 4  (reducer 4)
// getPartition("Hive", 1, 6)   -> returns 5  (reducer 5)
//
// Output files:
//   part-r-00000 contains only "Spark" count
//   part-r-00001 contains only "Hadoop" count
//   ... and so on

Related Pages

Implements Principle

Principle:Heibaiying_BigData_Notes_MapReduce_Partitioner

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment