Principle:Heibaiying BigData Notes HDFS Java API Operations
| Knowledge Sources | |
|---|---|
| Domains | HDFS, Hadoop, Distributed_Storage |
| Last Updated | 2026-02-10 10:30 GMT |
Overview
Principle of programmatically interacting with the Hadoop Distributed File System (HDFS) through its Java FileSystem API for file and directory management operations.
Description
The HDFS Java API provides programmatic access to the Hadoop Distributed File System, enabling applications to create, read, write, copy, list, and delete files and directories on HDFS without relying on shell commands. The central abstraction is the FileSystem class, obtained via FileSystem.get() with a Configuration object specifying the HDFS URI and authentication. All operations translate to RPC calls to the NameNode (for metadata) and DataNodes (for data blocks), leveraging HDFS replication and fault tolerance. This principle covers the client-side interaction pattern rather than the internal storage mechanisms of HDFS itself.
Usage
Apply this principle when building Java applications that need to interact with HDFS directly, such as data ingestion pipelines, ETL tools, or utility applications that manage files on the Hadoop cluster. It is the programmatic alternative to HDFS shell commands and is essential when file operations must be integrated into larger Java workflows.
Theoretical Basis
HDFS Java API operations follow the FileSystem abstraction pattern:
- Configuration: A Configuration object holds the HDFS URI (fs.defaultFS) and settings (dfs.replication, etc.).
- FileSystem Initialization: FileSystem.get(URI, Configuration, user) establishes a client-side connection to the NameNode.
- Path Abstraction: All file/directory references use Path objects wrapping HDFS URI strings.
- Operations: Each operation (mkdirs, open, create, rename, copyFromLocalFile, listStatus, delete) delegates to the NameNode for metadata and DataNodes for data.
- Stream I/O: Read operations return FSDataInputStream, write operations use FSDataOutputStream, both wrapping standard Java I/O streams.
Pseudo-code Logic:
// Abstract algorithm description
Configuration conf = new Configuration();
conf.set("fs.defaultFS", hdfsUri);
FileSystem fs = FileSystem.get(uri, conf, user);
// All operations use Path objects
fs.mkdirs(new Path("/dir"));
FSDataInputStream in = fs.open(new Path("/file"));
fs.delete(new Path("/file"), recursive);