Implementation:Heibaiying BigData Notes Job Assembly and Submission
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Distributed_Computing, Big_Data |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Concrete tool for assembling and submitting a complete MapReduce word count job to YARN provided by the BigData-Notes repository.
Description
The WordCountApp class is the driver program that assembles all components of the word count MapReduce pipeline into a single Job object and submits it for execution on the cluster. It demonstrates the full job configuration pattern including:
- Creating a Configuration with HDFS connection parameters.
- Instantiating a Job and setting the driver class for JAR distribution.
- Registering the WordCountMapper as the mapper and WordCountReducer as the reducer.
- Optionally setting a Combiner and CustomPartitioner.
- Configuring map and reduce output key-value types.
- Setting input and output paths via FileInputFormat and FileOutputFormat.
- Deleting any pre-existing output directory to prevent job failure.
- Submitting the job with job.waitForCompletion(true) and handling the exit code.
Usage
Use this as the main entry point to run the word count MapReduce application. It can be executed from the command line or submitted to a Hadoop cluster as a JAR.
Code Reference
Source Location
- Repository: BigData-Notes
- File: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountApp.java
- Lines: L19-84
Signature
public class WordCountApp {
// Key APIs used:
Job.getInstance(Configuration conf, String jobName)
job.setJarByClass(Class<?> cls)
job.setMapperClass(Class<? extends Mapper> cls)
job.setReducerClass(Class<? extends Reducer> cls)
job.setCombinerClass(Class<? extends Reducer> cls)
job.setPartitionerClass(Class<? extends Partitioner> cls)
job.setMapOutputKeyClass(Class<?> theClass)
job.setMapOutputValueClass(Class<?> theClass)
job.setOutputKeyClass(Class<?> theClass)
job.setOutputValueClass(Class<?> theClass)
FileInputFormat.addInputPath(Job job, Path path)
FileOutputFormat.setOutputPath(Job job, Path path)
job.setNumReduceTasks(int tasks)
job.waitForCompletion(boolean verbose)
}
Import
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import com.heibaiying.component.WordCountMapper;
import com.heibaiying.component.WordCountReducer;
import com.heibaiying.component.CustomPartitioner;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| conf | Configuration | Yes | Hadoop configuration with fs.defaultFS set to the HDFS namenode URL |
| inputPath | Path | Yes | HDFS path containing the input text files for word counting |
| outputPath | Path | Yes | HDFS path where the word count results will be written (must not already exist) |
Outputs
| Name | Type | Description |
|---|---|---|
| exit code | int | 0 on success, 1 on failure (passed to System.exit()) |
| output files | HDFS files | part-r-NNNNN files in the output directory, each containing word\tcount lines |
Usage Examples
Basic Usage
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import com.heibaiying.component.WordCountMapper;
import com.heibaiying.component.WordCountReducer;
// 1. Create configuration
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop001:8020");
// 2. Create job
Job job = Job.getInstance(conf, "WordCount");
job.setJarByClass(WordCountApp.class);
// 3. Set mapper and reducer
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
// 4. Set output types
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 5. Set input and output paths
FileInputFormat.addInputPath(job, new Path("/wordcount/input"));
FileOutputFormat.setOutputPath(job, new Path("/wordcount/output"));
// 6. Submit and wait
boolean success = job.waitForCompletion(true);
System.exit(success ? 0 : 1);
Full Assembly with Combiner and Partitioner
import com.heibaiying.component.CustomPartitioner;
import com.heibaiying.utils.WordCountDataUtils;
// ... (configuration and job creation as above)
// Set mapper, reducer, combiner, and partitioner
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setCombinerClass(WordCountReducer.class);
job.setPartitionerClass(CustomPartitioner.class);
// Number of reducers must match vocabulary size for custom partitioner
job.setNumReduceTasks(WordCountDataUtils.WORD_LIST.size());
// Set types and paths
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("/wordcount/input"));
FileOutputFormat.setOutputPath(job, new Path("/wordcount/output"));
// Delete output directory if it already exists
FileSystem fileSystem = FileSystem.get(new URI("hdfs://hadoop001:8020"),
conf, "root");
Path outputPath = new Path("/wordcount/output");
if (fileSystem.exists(outputPath)) {
fileSystem.delete(outputPath, true);
}
boolean success = job.waitForCompletion(true);
System.exit(success ? 0 : 1);
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment