Environment:Heibaiying BigData Notes Hadoop CDH Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Distributed_Storage |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
Hadoop 2.6.0-cdh5.15.2 (Cloudera Distribution) environment with HDFS, YARN, and MapReduce for distributed data processing.
Description
This environment provides the Cloudera CDH distribution of Apache Hadoop version 2.6.0-cdh5.15.2. It includes HDFS for distributed storage, YARN for resource management, and MapReduce for batch processing. The CDH distribution bundles tested-compatible versions of Hadoop ecosystem components. All Hadoop and HDFS Java API examples in the repository depend on `hadoop-client:2.6.0-cdh5.15.2`.
Usage
Use this environment for any Hadoop MapReduce or HDFS Java API operations. It is required by the Hadoop MapReduce Word Count workflow and HDFS utility classes. Also required by Storm-HDFS integration modules.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | CentOS 7.6 | Linux with SSH configured |
| Java | JDK 1.8 | Required by Hadoop 2.6 |
| Hardware | Minimum 4GB RAM per node | 8GB+ recommended for cluster |
| Disk | 50GB+ per DataNode | HDFS storage; SSD preferred |
| Network | SSH passwordless login between nodes | Required for cluster mode |
Dependencies
System Packages
- `hadoop` = 2.6.0-cdh5.15.2
- `java-1.8.0-openjdk-devel`
- `ssh` (OpenSSH)
Java Packages (Maven)
- `org.apache.hadoop:hadoop-client` = 2.6.0-cdh5.15.2
- `org.apache.commons:commons-lang3` >= 3.8.1
Environment Variables
- `HADOOP_HOME` = Hadoop installation directory
- `HADOOP_CONF_DIR` = `$HADOOP_HOME/etc/hadoop`
- `PATH` includes `$HADOOP_HOME/bin` and `$HADOOP_HOME/sbin`
Credentials
No API credentials required. Hadoop user permissions are managed via OS-level user accounts:
- `HADOOP_USER_NAME`: Set to the Hadoop superuser (e.g., `root`) to avoid HDFS permission errors when running from IDE.
Quick Install
# Download and extract Hadoop CDH
wget https://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.15.2.tar.gz
tar -xzf hadoop-2.6.0-cdh5.15.2.tar.gz -C /opt/
# Set environment variables
export HADOOP_HOME=/opt/hadoop-2.6.0-cdh5.15.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
# Maven dependency
# Add to pom.xml:
# <dependency>
# <groupId>org.apache.hadoop</groupId>
# <artifactId>hadoop-client</artifactId>
# <version>2.6.0-cdh5.15.2</version>
# </dependency>
Code Evidence
Hadoop client dependency from `hadoop-word-count/pom.xml`:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.6.0-cdh5.15.2</version>
</dependency>
HDFS configuration from `HdfsUtils.java`:
private static final String HDFS_PATH = "hdfs://hadoop001:8020";
private static final String HDFS_USER = "root";
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Permission denied: user=xxx` | HDFS user mismatch | Set `HADOOP_USER_NAME` environment variable or use `-DHADOOP_USER_NAME=root` |
| `Output directory already exists` | MapReduce output path exists | Delete output directory before rerunning the job |
| `Connection refused to hadoop001:8020` | NameNode not running | Start HDFS with `start-dfs.sh` |
Compatibility Notes
- CDH 5.15.2 bundles specific tested versions of all Hadoop components. Mixing CDH and Apache releases may cause classpath conflicts.
- Storm-HDFS integration requires matching Hadoop client version (2.6.0-cdh5.15.2) in the Storm project pom.xml.
- Single-node testing: Set HDFS replication factor to 1 in `hdfs-site.xml` for development.
Related Pages
- Implementation:Heibaiying_BigData_Notes_WordCountDataUtils_GenerateDataToHDFS
- Implementation:Heibaiying_BigData_Notes_WordCountMapper_Map
- Implementation:Heibaiying_BigData_Notes_WordCountReducer_Reduce
- Implementation:Heibaiying_BigData_Notes_Job_SetCombinerClass
- Implementation:Heibaiying_BigData_Notes_CustomPartitioner_GetPartition
- Implementation:Heibaiying_BigData_Notes_Job_Assembly_and_Submission