Implementation:Heibaiying BigData Notes Job SetCombinerClass
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, Big_Data |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Concrete tool for configuring a local Combiner on a MapReduce job to perform pre-aggregation before the shuffle phase, provided by the Hadoop MapReduce framework.
Description
The job.setCombinerClass() method registers a Combiner class with the MapReduce job. In the BigData-Notes word count example, the WordCountReducer is reused as the Combiner because the summation operation it performs is both associative and commutative. This is a common pattern in MapReduce applications where the reduce logic can be safely applied as a local optimization.
When set, the Combiner runs on each mapper node after the map phase completes but before data is shuffled to the reducers. This is a wrapper around the Hadoop Job.setCombinerClass() API that demonstrates how to wire an existing Reducer into the Combiner role.
Usage
Use this configuration when your reduce function is associative and commutative (such as summation) and you want to reduce the volume of data shuffled across the network. It is a single line of configuration during job assembly.
Code Reference
Source Location
- Repository: BigData-Notes
- File: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountCombinerApp.java
- Lines: L55
Signature
// From org.apache.hadoop.mapreduce.Job:
public void setCombinerClass(Class<? extends Reducer> cls) throws IllegalStateException
// Usage in WordCountCombinerApp:
job.setCombinerClass(WordCountReducer.class);
Import
import org.apache.hadoop.mapreduce.Job;
import com.heibaiying.component.WordCountReducer;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| cls | Class<? extends Reducer> | Yes | The Reducer class to be used as the Combiner (e.g., WordCountReducer.class) |
Outputs
| Name | Type | Description |
|---|---|---|
| void | void | Configures the job internally; no return value |
Usage Examples
Basic Usage
import org.apache.hadoop.mapreduce.Job;
import com.heibaiying.component.WordCountReducer;
Job job = Job.getInstance(conf, "WordCountWithCombiner");
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// Reuse the Reducer as the Combiner for local pre-aggregation
job.setCombinerClass(WordCountReducer.class);
Verifying Combiner Effect
// Without Combiner:
// Mapper emits 6000 pairs (1000 lines x 6 words)
// All 6000 pairs are shuffled to reducers
// With Combiner:
// Mapper emits 6000 pairs (1000 lines x 6 words)
// Combiner reduces to 6 pairs per mapper (one per unique word)
// Only 6 pairs per mapper are shuffled to reducers
// The Combiner counters in the job output will show:
// Combine input records: 6000
// Combine output records: 6