Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Datahub project Datahub HdfsPlatform

From Leeroopedia


Knowledge Sources
Domains OpenLineage_Integration, Platform_Detection
Last Updated 2026-02-10 00:00 GMT

Overview

Description

HdfsPlatform is a Java enum in the OpenLineage converter module that maps filesystem URI scheme prefixes to DataHub platform identifiers and provides a method to check whether a given prefix corresponds to a known filesystem platform.

The enum defines seven platform categories:

  • S3 -- Amazon S3 (s3, s3a, s3n)
  • GCS -- Google Cloud Storage (gs, gcs)
  • ABFS -- Azure Blob File System (abfs, abfss)
  • WASB -- Azure Blob Storage legacy (wasb, wasbs)
  • DBFS -- Databricks File System (dbfs)
  • FILE -- Local filesystem (file)
  • HDFS -- Hadoop Distributed File System (default, no specific prefixes)

Each enum constant holds a list of recognized URI prefixes and the corresponding DataHub platform string.

Usage

Used by the OpenLineage dataset resolution logic to determine whether a URI prefix represents a known filesystem platform, primarily through the isFsPlatformPrefix method.

Code Reference

Source Location

metadata-integration/java/openlineage-converter/src/main/java/io/datahubproject/openlineage/dataset/HdfsPlatform.java

Signature

public enum HdfsPlatform {
    S3(Arrays.asList("s3", "s3a", "s3n"), "s3"),
    GCS(Arrays.asList("gs", "gcs"), "gcs"),
    ABFS(Arrays.asList("abfs", "abfss"), "abs"),
    WASB(Arrays.asList("wasb", "wasbs"), "abs"),
    DBFS(Collections.singletonList("dbfs"), "dbfs"),
    FILE(Collections.singletonList("file"), "file"),
    HDFS(Collections.emptyList(), "hdfs");

    public final List<String> prefixes;
    public final String platform;

    HdfsPlatform(List<String> prefixes, String platform)

    public static boolean isFsPlatformPrefix(String prefix)
}

Import

import io.datahubproject.openlineage.dataset.HdfsPlatform;

I/O Contract

Inputs

Method Parameter Type Description
isFsPlatformPrefix prefix String A URI scheme prefix (e.g., "s3", "gs", "abfss")

Outputs

Method Return Type Description
isFsPlatformPrefix boolean true if the prefix matches any known filesystem platform, false otherwise

Prefix to Platform Mapping:

URI Prefix(es) DataHub Platform String
s3, s3a, s3n "s3"
gs, gcs "gcs"
abfs, abfss "abs"
wasb, wasbs "abs"
dbfs "dbfs"
file "file"
(no prefix match) "hdfs" (default)

Usage Examples

// Check if a prefix is a known filesystem platform
boolean isKnown = HdfsPlatform.isFsPlatformPrefix("s3a");  // true
boolean isUnknown = HdfsPlatform.isFsPlatformPrefix("ftp"); // false

// Access platform properties
HdfsPlatform.S3.platform;   // "s3"
HdfsPlatform.S3.prefixes;   // ["s3", "s3a", "s3n"]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment