Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon PaimonVirtualFileSystem

From Leeroopedia


Knowledge Sources
Domains Filesystem Abstraction, Storage Integration
Last Updated 2026-02-08 00:00 GMT

Overview

PaimonVirtualFileSystem (PVFS) implements an fsspec-compatible virtual filesystem with the "pvfs://" protocol that maps Paimon catalog paths to actual storage locations via REST API, enabling transparent access to tables across different storage backends.

Description

PaimonVirtualFileSystem extends `fsspec.AbstractFileSystem` with the "pvfs" protocol, providing a unified view of Paimon tables through hierarchical paths (pvfs://catalog/database/table/subpath). The class parses these paths into hierarchical identifiers (`PVFSCatalogIdentifier`, `PVFSDatabaseIdentifier`, `PVFSTableIdentifier`) and dispatches filesystem operations accordingly. For catalog/database-level operations (ls, exists, mkdir, rm), it calls the REST API to list databases, create/drop databases, etc. For table-level operations, it resolves the table's actual storage location via REST API `get_table()`, determines the `StorageType` (LOCAL or OSS), creates the appropriate underlying filesystem (LocalFileSystem or OSSFileSystem via ossfs), and delegates actual I/O operations. Caching is implemented with TTL and LRU strategies: table metadata uses TTLCache (default 300s), REST API clients use LRUCache per catalog/endpoint, and filesystem instances use LRUCache with refresh logic for temporary credentials. All caches are protected by read-write locks for thread safety. Path translation converts virtual pvfs:// paths to/from actual storage paths by tracking storage location prefixes. The implementation supports database/table CRUD, file I/O (open, cat_file, get_file), directory operations (mkdir, makedirs, rm, rmdir), metadata queries (info, created, modified), and file operations (cp_file, mv). External path support is provided through `ExternalPathProvider` for object table scenarios.

This abstraction enables tools like pandas, PyArrow, and other fsspec-compatible libraries to access Paimon tables using virtual paths without knowledge of underlying storage systems, credentials, or catalog APIs.

Usage

PVFS is registered as an fsspec protocol and can be used with any fsspec-compatible library by providing pvfs:// URIs with appropriate options.

Code Reference

Source Location

Signature

class PaimonVirtualFileSystem(fsspec.AbstractFileSystem):
    protocol = "pvfs"

    def __init__(self, options: Union[Options, Dict[str, str]] = None, **kwargs): ...

    def ls(self, path, detail=True, **kwargs): ...
    def info(self, path, **kwargs): ...
    def exists(self, path, **kwargs): ...
    def open(self, path, mode="rb", **kwargs): ...
    def mkdir(self, path, create_parents=True, **kwargs): ...
    def rm(self, path, recursive=False, maxdepth=None): ...
    def mv(self, path1, path2, recursive=False, maxdepth=None, **kwargs): ...
    def cp_file(self, path1, path2, **kwargs): ...
    def cat_file(self, path, start=None, end=None, **kwargs): ...
    def created(self, path): ...
    def modified(self, path): ...

Import

from pypaimon.filesystem.pvfs import PaimonVirtualFileSystem

# Or register and use via fsspec
import fsspec
fs = fsspec.filesystem("pvfs", options={...})

I/O Contract

Inputs

Name Type Required Description
options Options/Dict yes Configuration including URI, warehouse, authentication
path str yes Virtual path in pvfs://catalog/database/table/subpath format

Outputs

Name Type Description
File listing List[Dict/str] Directory contents with details or paths
File info Dict File metadata (name, size, type, mtime)
File content bytes/File object Data from open(), cat_file(), get_file()

Usage Examples

List Databases and Tables

from pypaimon.filesystem.pvfs import PaimonVirtualFileSystem

# Initialize PVFS
fs = PaimonVirtualFileSystem(options={
    "uri": "http://localhost:8080",
    "warehouse": "my_warehouse"
})

# List databases
databases = fs.ls("pvfs://my_catalog")
# ['pvfs://my_catalog/default', 'pvfs://my_catalog/test_db']

# List tables
tables = fs.ls("pvfs://my_catalog/default")
# ['pvfs://my_catalog/default/users', 'pvfs://my_catalog/default/orders']

# List table files
files = fs.ls("pvfs://my_catalog/default/users")
# ['pvfs://my_catalog/default/users/manifest', 'pvfs://my_catalog/default/users/snapshot', ...]

Read/Write Files

# Read a manifest file
with fs.open("pvfs://my_catalog/default/users/manifest/manifest-123", "rb") as f:
    data = f.read()

# Write a file (for object tables)
with fs.open("pvfs://my_catalog/default/documents/doc1.txt", "wb") as f:
    f.write(b"Hello, Paimon!")

# Read file content
content = fs.cat_file("pvfs://my_catalog/default/documents/doc1.txt")

Database Operations

# Create database
fs.mkdir("pvfs://my_catalog/new_db")

# Drop database
fs.rm("pvfs://my_catalog/new_db", recursive=True)

# Check if database exists
exists = fs.exists("pvfs://my_catalog/default")  # True

Table Operations

# Create object table
fs.mkdir("pvfs://my_catalog/default/documents")

# Move/rename table
fs.mv(
    "pvfs://my_catalog/default/old_table",
    "pvfs://my_catalog/default/new_table"
)

# Drop table
fs.rm("pvfs://my_catalog/default/documents", recursive=True)

Integration with Pandas

import pandas as pd

# Read Parquet files from a table
df = pd.read_parquet(
    "pvfs://my_catalog/default/users/bucket-0/data-001.parquet",
    filesystem=fs
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment