Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Heibaiying BigData Notes CustomPartitioner GetPartition

From Leeroopedia


Knowledge Sources
Domains Distributed_Computing, Big_Data
Last Updated 2026-02-10 10:00 GMT

Overview

Concrete tool for routing intermediate key-value pairs to specific reducers based on word index provided by the BigData-Notes repository.

Description

The CustomPartitioner class extends Hadoop's Partitioner<Text, IntWritable> and implements a deterministic partitioning scheme for the word count pipeline. Instead of using the default hash-based partitioning, this custom partitioner assigns each word to a reducer based on the word's index position in the predefined WordCountDataUtils.WORD_LIST.

The getPartition() method looks up the input word in the static vocabulary list and returns its index as the partition number. This guarantees that each word is always routed to the same reducer, and when the number of reduce tasks equals the vocabulary size (6), each reducer processes exactly one word.

Usage

Use this custom partitioner when you need deterministic key-to-reducer assignment in the word count pipeline. It must be registered with the job via job.setPartitionerClass(CustomPartitioner.class) and the number of reduce tasks must be set to match the vocabulary size (6).

Code Reference

Source Location

  • Repository: BigData-Notes
  • File: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/CustomPartitioner.java
  • Lines: L11-16

Signature

public class CustomPartitioner extends Partitioner<Text, IntWritable> {

    @Override
    public int getPartition(Text text, IntWritable intWritable, int numPartitions)
}

Import

import com.heibaiying.component.CustomPartitioner;

I/O Contract

Inputs

Name Type Required Description
text Text Yes The intermediate key (a word) emitted by the mapper
intWritable IntWritable Yes The intermediate value (the count) emitted by the mapper
numPartitions int Yes The total number of reduce tasks (should equal the vocabulary size)

Outputs

Name Type Description
partition int The index of the word in WordCountDataUtils.WORD_LIST, determining which reducer receives this key

Usage Examples

Basic Usage

import com.heibaiying.component.CustomPartitioner;
import org.apache.hadoop.mapreduce.Job;

Job job = Job.getInstance(conf, "WordCountWithPartitioner");
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);

// Register the custom partitioner
job.setPartitionerClass(CustomPartitioner.class);

// Set the number of reduce tasks to match the vocabulary size
job.setNumReduceTasks(WordCountDataUtils.WORD_LIST.size());

Partition Assignment

// Given WORD_LIST = ["Spark", "Hadoop", "HBase", "Storm", "Flink", "Hive"]
//
// getPartition("Spark", 1, 6)  -> returns 0  (reducer 0)
// getPartition("Hadoop", 1, 6) -> returns 1  (reducer 1)
// getPartition("HBase", 1, 6)  -> returns 2  (reducer 2)
// getPartition("Storm", 1, 6)  -> returns 3  (reducer 3)
// getPartition("Flink", 1, 6)  -> returns 4  (reducer 4)
// getPartition("Hive", 1, 6)   -> returns 5  (reducer 5)
//
// Output files:
//   part-r-00000 contains only "Spark" count
//   part-r-00001 contains only "Hadoop" count
//   ... and so on

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment