Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Heibaiying BigData Notes Hadoop CDH Environment

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Distributed_Storage
Last Updated 2026-02-10 10:00 GMT

Overview

Hadoop 2.6.0-cdh5.15.2 (Cloudera Distribution) environment with HDFS, YARN, and MapReduce for distributed data processing.

Description

This environment provides the Cloudera CDH distribution of Apache Hadoop version 2.6.0-cdh5.15.2. It includes HDFS for distributed storage, YARN for resource management, and MapReduce for batch processing. The CDH distribution bundles tested-compatible versions of Hadoop ecosystem components. All Hadoop and HDFS Java API examples in the repository depend on `hadoop-client:2.6.0-cdh5.15.2`.

Usage

Use this environment for any Hadoop MapReduce or HDFS Java API operations. It is required by the Hadoop MapReduce Word Count workflow and HDFS utility classes. Also required by Storm-HDFS integration modules.

System Requirements

Category Requirement Notes
OS CentOS 7.6 Linux with SSH configured
Java JDK 1.8 Required by Hadoop 2.6
Hardware Minimum 4GB RAM per node 8GB+ recommended for cluster
Disk 50GB+ per DataNode HDFS storage; SSD preferred
Network SSH passwordless login between nodes Required for cluster mode

Dependencies

System Packages

  • `hadoop` = 2.6.0-cdh5.15.2
  • `java-1.8.0-openjdk-devel`
  • `ssh` (OpenSSH)

Java Packages (Maven)

  • `org.apache.hadoop:hadoop-client` = 2.6.0-cdh5.15.2
  • `org.apache.commons:commons-lang3` >= 3.8.1

Environment Variables

  • `HADOOP_HOME` = Hadoop installation directory
  • `HADOOP_CONF_DIR` = `$HADOOP_HOME/etc/hadoop`
  • `PATH` includes `$HADOOP_HOME/bin` and `$HADOOP_HOME/sbin`

Credentials

No API credentials required. Hadoop user permissions are managed via OS-level user accounts:

  • `HADOOP_USER_NAME`: Set to the Hadoop superuser (e.g., `root`) to avoid HDFS permission errors when running from IDE.

Quick Install

# Download and extract Hadoop CDH
wget https://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.15.2.tar.gz
tar -xzf hadoop-2.6.0-cdh5.15.2.tar.gz -C /opt/

# Set environment variables
export HADOOP_HOME=/opt/hadoop-2.6.0-cdh5.15.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

# Maven dependency
# Add to pom.xml:
# <dependency>
#   <groupId>org.apache.hadoop</groupId>
#   <artifactId>hadoop-client</artifactId>
#   <version>2.6.0-cdh5.15.2</version>
# </dependency>

Code Evidence

Hadoop client dependency from `hadoop-word-count/pom.xml`:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>2.6.0-cdh5.15.2</version>
</dependency>

HDFS configuration from `HdfsUtils.java`:

private static final String HDFS_PATH = "hdfs://hadoop001:8020";
private static final String HDFS_USER = "root";

Common Errors

Error Message Cause Solution
`Permission denied: user=xxx` HDFS user mismatch Set `HADOOP_USER_NAME` environment variable or use `-DHADOOP_USER_NAME=root`
`Output directory already exists` MapReduce output path exists Delete output directory before rerunning the job
`Connection refused to hadoop001:8020` NameNode not running Start HDFS with `start-dfs.sh`

Compatibility Notes

  • CDH 5.15.2 bundles specific tested versions of all Hadoop components. Mixing CDH and Apache releases may cause classpath conflicts.
  • Storm-HDFS integration requires matching Hadoop client version (2.6.0-cdh5.15.2) in the Storm project pom.xml.
  • Single-node testing: Set HDFS replication factor to 1 in `hdfs-site.xml` for development.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment