Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Heibaiying BigData Notes MapReduce Reduce Phase

From Leeroopedia


Knowledge Sources
Domains Distributed_Computing, Big_Data
Last Updated 2026-02-10 10:00 GMT

Overview

The Reduce phase of MapReduce aggregates all intermediate values associated with each unique key to produce the final output of the computation.

Description

After the Map phase completes, the framework performs a shuffle and sort step that groups all intermediate key-value pairs by key. The Reduce phase then invokes the user-defined reduce() function once per unique key, passing the key and an iterable of all values associated with that key.

In a word count application, the reducer receives a word as the key and an iterable of integer counts (each equal to 1) as the values. The reducer sums these counts and emits a single pair (word, totalCount), producing the final word frequency for that word.

The number of reducers is configurable and determines the degree of parallelism in the reduce phase. Each reducer handles a subset of keys as determined by the Partitioner. The output of each reducer is written to a separate output file in the designated HDFS output directory (e.g., part-r-00000, part-r-00001, etc.).

Usage

Use a custom Reducer when:

  • You need to aggregate, summarize, or combine values that share the same key.
  • The aggregation logic (sum, average, max, min, concatenation, etc.) requires iterating over all values for a key.
  • You want to produce a single consolidated result per unique key.
  • The reduce function is associative and commutative (which also allows it to be used as a Combiner).

Theoretical Basis

The Reduce function can be formally described as:

reduce: (K2, list(V2)) -> list(K3, V3)

Where:

  • K2 is the intermediate key type (e.g., Text representing a word).
  • list(V2) is the list of all intermediate values for that key (e.g., a list of IntWritable values).
  • K3 is the output key type (typically the same as K2).
  • V3 is the output value type (e.g., IntWritable representing the total count).

For word count, given the key "Hadoop" and values [1, 1, 1, 1, 1], the reduce function performs:

  1. Initialize a counter: sum = 0.
  2. Iterate over each value v in the values iterable: sum = sum + v.
  3. Emit the result: ("Hadoop", 5).

The correctness of the reduce phase depends on the shuffle and sort guarantee: all values for a given key are delivered to exactly one reducer, and they arrive grouped together. This invariant ensures that no counts are lost or double-counted.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment