Implementation:Heibaiying BigData Notes CustomPartitioner GetPartition
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, Big_Data |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Concrete tool for routing intermediate key-value pairs to specific reducers based on word index provided by the BigData-Notes repository.
Description
The CustomPartitioner class extends Hadoop's Partitioner<Text, IntWritable> and implements a deterministic partitioning scheme for the word count pipeline. Instead of using the default hash-based partitioning, this custom partitioner assigns each word to a reducer based on the word's index position in the predefined WordCountDataUtils.WORD_LIST.
The getPartition() method looks up the input word in the static vocabulary list and returns its index as the partition number. This guarantees that each word is always routed to the same reducer, and when the number of reduce tasks equals the vocabulary size (6), each reducer processes exactly one word.
Usage
Use this custom partitioner when you need deterministic key-to-reducer assignment in the word count pipeline. It must be registered with the job via job.setPartitionerClass(CustomPartitioner.class) and the number of reduce tasks must be set to match the vocabulary size (6).
Code Reference
Source Location
- Repository: BigData-Notes
- File: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/CustomPartitioner.java
- Lines: L11-16
Signature
public class CustomPartitioner extends Partitioner<Text, IntWritable> {
@Override
public int getPartition(Text text, IntWritable intWritable, int numPartitions)
}
Import
import com.heibaiying.component.CustomPartitioner;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| text | Text | Yes | The intermediate key (a word) emitted by the mapper |
| intWritable | IntWritable | Yes | The intermediate value (the count) emitted by the mapper |
| numPartitions | int | Yes | The total number of reduce tasks (should equal the vocabulary size) |
Outputs
| Name | Type | Description |
|---|---|---|
| partition | int | The index of the word in WordCountDataUtils.WORD_LIST, determining which reducer receives this key |
Usage Examples
Basic Usage
import com.heibaiying.component.CustomPartitioner;
import org.apache.hadoop.mapreduce.Job;
Job job = Job.getInstance(conf, "WordCountWithPartitioner");
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// Register the custom partitioner
job.setPartitionerClass(CustomPartitioner.class);
// Set the number of reduce tasks to match the vocabulary size
job.setNumReduceTasks(WordCountDataUtils.WORD_LIST.size());
Partition Assignment
// Given WORD_LIST = ["Spark", "Hadoop", "HBase", "Storm", "Flink", "Hive"]
//
// getPartition("Spark", 1, 6) -> returns 0 (reducer 0)
// getPartition("Hadoop", 1, 6) -> returns 1 (reducer 1)
// getPartition("HBase", 1, 6) -> returns 2 (reducer 2)
// getPartition("Storm", 1, 6) -> returns 3 (reducer 3)
// getPartition("Flink", 1, 6) -> returns 4 (reducer 4)
// getPartition("Hive", 1, 6) -> returns 5 (reducer 5)
//
// Output files:
// part-r-00000 contains only "Spark" count
// part-r-00001 contains only "Hadoop" count
// ... and so on