Implementation:Heibaiying BigData Notes WordCountMapper Map
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, Big_Data |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Concrete tool for tokenizing input text lines into individual word counts provided by the BigData-Notes repository.
Description
The WordCountMapper class extends Hadoop's Mapper<LongWritable, Text, Text, IntWritable> and implements the map phase of the word count pipeline. For each input line, the map() method splits the text by tab characters and emits a (word, 1) key-value pair for every token found.
The input key is a LongWritable representing the byte offset of the line within the input split (provided by the framework and typically unused by the mapper logic). The input value is a Text object containing the line content. The output key is a Text object containing an individual word, and the output value is an IntWritable with the value 1.
Usage
Use this mapper as part of a word count MapReduce job. It is registered with the job via job.setMapperClass(WordCountMapper.class) during job assembly.
Code Reference
Source Location
- Repository: BigData-Notes
- File: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/component/WordCountMapper.java
- Lines: L13-23
Signature
public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException
}
Import
import com.heibaiying.component.WordCountMapper;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| key | LongWritable | Yes | Byte offset of the input line (provided by the framework) |
| value | Text | Yes | A single line of text from the input file (tab-delimited words) |
| context | Context | Yes | The Mapper context used to emit output key-value pairs |
Outputs
| Name | Type | Description |
|---|---|---|
| key | Text | An individual word extracted from the input line |
| value | IntWritable | The integer constant 1, representing a single occurrence of the word |
Usage Examples
Basic Usage
import com.heibaiying.component.WordCountMapper;
import org.apache.hadoop.mapreduce.Job;
// Register the mapper with a MapReduce job
Job job = Job.getInstance(conf, "WordCount");
job.setMapperClass(WordCountMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
Internal Logic
// For an input line: "Spark\tHadoop\tHBase\tStorm\tFlink\tHive"
// The map method splits by "\t" and emits:
// ("Spark", 1)
// ("Hadoop", 1)
// ("HBase", 1)
// ("Storm", 1)
// ("Flink", 1)
// ("Hive", 1)