Implementation:Heibaiying BigData Notes Job Assembly and Submission

Knowledge Sources	BigData-Notes Hadoop API
Domains	Distributed_Computing, Big_Data
Last Updated	2026-02-10 10:00 GMT

Overview

Concrete tool for assembling and submitting a complete MapReduce word count job to YARN provided by the BigData-Notes repository.

Description

The WordCountApp class is the driver program that assembles all components of the word count MapReduce pipeline into a single Job object and submits it for execution on the cluster. It demonstrates the full job configuration pattern including:

Creating a Configuration with HDFS connection parameters.
Instantiating a Job and setting the driver class for JAR distribution.
Registering the WordCountMapper as the mapper and WordCountReducer as the reducer.
Optionally setting a Combiner and CustomPartitioner.
Configuring map and reduce output key-value types.
Setting input and output paths via FileInputFormat and FileOutputFormat.
Deleting any pre-existing output directory to prevent job failure.
Submitting the job with job.waitForCompletion(true) and handling the exit code.

Usage

Use this as the main entry point to run the word count MapReduce application. It can be executed from the command line or submitted to a Hadoop cluster as a JAR.

Code Reference

Source Location

Repository: BigData-Notes
File: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountApp.java
Lines: L19-84

Signature

public class WordCountApp {

    // Key APIs used:
    Job.getInstance(Configuration conf, String jobName)
    job.setJarByClass(Class<?> cls)
    job.setMapperClass(Class<? extends Mapper> cls)
    job.setReducerClass(Class<? extends Reducer> cls)
    job.setCombinerClass(Class<? extends Reducer> cls)
    job.setPartitionerClass(Class<? extends Partitioner> cls)
    job.setMapOutputKeyClass(Class<?> theClass)
    job.setMapOutputValueClass(Class<?> theClass)
    job.setOutputKeyClass(Class<?> theClass)
    job.setOutputValueClass(Class<?> theClass)
    FileInputFormat.addInputPath(Job job, Path path)
    FileOutputFormat.setOutputPath(Job job, Path path)
    job.setNumReduceTasks(int tasks)
    job.waitForCompletion(boolean verbose)
}

Import

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import com.heibaiying.component.WordCountMapper;
import com.heibaiying.component.WordCountReducer;
import com.heibaiying.component.CustomPartitioner;

I/O Contract

Inputs

Name	Type	Required	Description
conf	Configuration	Yes	Hadoop configuration with fs.defaultFS set to the HDFS namenode URL
inputPath	Path	Yes	HDFS path containing the input text files for word counting
outputPath	Path	Yes	HDFS path where the word count results will be written (must not already exist)

Outputs

Name	Type	Description
exit code	int	0 on success, 1 on failure (passed to System.exit())
output files	HDFS files	part-r-NNNNN files in the output directory, each containing word\tcount lines

Usage Examples

Basic Usage

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import com.heibaiying.component.WordCountMapper;
import com.heibaiying.component.WordCountReducer;

// 1. Create configuration
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop001:8020");

// 2. Create job
Job job = Job.getInstance(conf, "WordCount");
job.setJarByClass(WordCountApp.class);

// 3. Set mapper and reducer
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);

// 4. Set output types
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

// 5. Set input and output paths
FileInputFormat.addInputPath(job, new Path("/wordcount/input"));
FileOutputFormat.setOutputPath(job, new Path("/wordcount/output"));

// 6. Submit and wait
boolean success = job.waitForCompletion(true);
System.exit(success ? 0 : 1);

Full Assembly with Combiner and Partitioner

import com.heibaiying.component.CustomPartitioner;
import com.heibaiying.utils.WordCountDataUtils;

// ... (configuration and job creation as above)

// Set mapper, reducer, combiner, and partitioner
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setCombinerClass(WordCountReducer.class);
job.setPartitionerClass(CustomPartitioner.class);

// Number of reducers must match vocabulary size for custom partitioner
job.setNumReduceTasks(WordCountDataUtils.WORD_LIST.size());

// Set types and paths
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path("/wordcount/input"));
FileOutputFormat.setOutputPath(job, new Path("/wordcount/output"));

// Delete output directory if it already exists
FileSystem fileSystem = FileSystem.get(new URI("hdfs://hadoop001:8020"),
    conf, "root");
Path outputPath = new Path("/wordcount/output");
if (fileSystem.exists(outputPath)) {
    fileSystem.delete(outputPath, true);
}

boolean success = job.waitForCompletion(true);
System.exit(success ? 0 : 1);

Related Pages

Implements Principle

Principle:Heibaiying_BigData_Notes_MapReduce_Job_Assembly

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment