Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Heibaiying BigData Notes Job Assembly and Submission

From Leeroopedia


Knowledge Sources
Domains Distributed_Computing, Big_Data
Last Updated 2026-02-10 10:00 GMT

Overview

Concrete tool for assembling and submitting a complete MapReduce word count job to YARN provided by the BigData-Notes repository.

Description

The WordCountApp class is the driver program that assembles all components of the word count MapReduce pipeline into a single Job object and submits it for execution on the cluster. It demonstrates the full job configuration pattern including:

  • Creating a Configuration with HDFS connection parameters.
  • Instantiating a Job and setting the driver class for JAR distribution.
  • Registering the WordCountMapper as the mapper and WordCountReducer as the reducer.
  • Optionally setting a Combiner and CustomPartitioner.
  • Configuring map and reduce output key-value types.
  • Setting input and output paths via FileInputFormat and FileOutputFormat.
  • Deleting any pre-existing output directory to prevent job failure.
  • Submitting the job with job.waitForCompletion(true) and handling the exit code.

Usage

Use this as the main entry point to run the word count MapReduce application. It can be executed from the command line or submitted to a Hadoop cluster as a JAR.

Code Reference

Source Location

  • Repository: BigData-Notes
  • File: code/Hadoop/hadoop-word-count/src/main/java/com/heibaiying/WordCountApp.java
  • Lines: L19-84

Signature

public class WordCountApp {

    // Key APIs used:
    Job.getInstance(Configuration conf, String jobName)
    job.setJarByClass(Class<?> cls)
    job.setMapperClass(Class<? extends Mapper> cls)
    job.setReducerClass(Class<? extends Reducer> cls)
    job.setCombinerClass(Class<? extends Reducer> cls)
    job.setPartitionerClass(Class<? extends Partitioner> cls)
    job.setMapOutputKeyClass(Class<?> theClass)
    job.setMapOutputValueClass(Class<?> theClass)
    job.setOutputKeyClass(Class<?> theClass)
    job.setOutputValueClass(Class<?> theClass)
    FileInputFormat.addInputPath(Job job, Path path)
    FileOutputFormat.setOutputPath(Job job, Path path)
    job.setNumReduceTasks(int tasks)
    job.waitForCompletion(boolean verbose)
}

Import

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import com.heibaiying.component.WordCountMapper;
import com.heibaiying.component.WordCountReducer;
import com.heibaiying.component.CustomPartitioner;

I/O Contract

Inputs

Name Type Required Description
conf Configuration Yes Hadoop configuration with fs.defaultFS set to the HDFS namenode URL
inputPath Path Yes HDFS path containing the input text files for word counting
outputPath Path Yes HDFS path where the word count results will be written (must not already exist)

Outputs

Name Type Description
exit code int 0 on success, 1 on failure (passed to System.exit())
output files HDFS files part-r-NNNNN files in the output directory, each containing word\tcount lines

Usage Examples

Basic Usage

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import com.heibaiying.component.WordCountMapper;
import com.heibaiying.component.WordCountReducer;

// 1. Create configuration
Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://hadoop001:8020");

// 2. Create job
Job job = Job.getInstance(conf, "WordCount");
job.setJarByClass(WordCountApp.class);

// 3. Set mapper and reducer
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);

// 4. Set output types
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

// 5. Set input and output paths
FileInputFormat.addInputPath(job, new Path("/wordcount/input"));
FileOutputFormat.setOutputPath(job, new Path("/wordcount/output"));

// 6. Submit and wait
boolean success = job.waitForCompletion(true);
System.exit(success ? 0 : 1);

Full Assembly with Combiner and Partitioner

import com.heibaiying.component.CustomPartitioner;
import com.heibaiying.utils.WordCountDataUtils;

// ... (configuration and job creation as above)

// Set mapper, reducer, combiner, and partitioner
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setCombinerClass(WordCountReducer.class);
job.setPartitionerClass(CustomPartitioner.class);

// Number of reducers must match vocabulary size for custom partitioner
job.setNumReduceTasks(WordCountDataUtils.WORD_LIST.size());

// Set types and paths
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path("/wordcount/input"));
FileOutputFormat.setOutputPath(job, new Path("/wordcount/output"));

// Delete output directory if it already exists
FileSystem fileSystem = FileSystem.get(new URI("hdfs://hadoop001:8020"),
    conf, "root");
Path outputPath = new Path("/wordcount/output");
if (fileSystem.exists(outputPath)) {
    fileSystem.delete(outputPath, true);
}

boolean success = job.waitForCompletion(true);
System.exit(success ? 0 : 1);

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment