Principle:Apache Paimon Manifest Management
| Knowledge Sources | |
|---|---|
| Domains | Metadata, Storage |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Manifest management provides atomic, versioned tracking of all data files in a table through hierarchical metadata structures that enable snapshot isolation and efficient file discovery.
Description
The manifest management principle solves the challenge of maintaining consistent metadata for tables that may contain millions of data files distributed across thousands of partitions. Rather than storing a flat list of all files, the system organizes metadata hierarchically: individual data file entries are grouped into manifest files, and collections of manifest files are referenced by manifest lists. This multi-level structure enables incremental metadata updates, where only changed portions need to be rewritten rather than the entire metadata set.
Each manifest file contains entries describing a subset of data files, including their physical locations, size, row counts, statistics, and partition information. Manifest files are immutable once written, allowing concurrent readers to access consistent snapshots without locking. When data files are added or removed, the system writes new manifest files containing only the changes, then creates a new manifest list that references both unchanged and newly written manifest files. This copy-on-write approach provides atomic updates: a new table snapshot becomes visible only when the manifest list is successfully written.
The manifest list serves as the root of the metadata hierarchy, containing references to all manifest files that comprise a complete table snapshot. It stores aggregate statistics like total file count and data size, enabling quick metadata queries without reading individual manifest files. Index manifest files extend this pattern to track auxiliary index structures, maintaining separate metadata hierarchies for different index types. This separation allows index maintenance to proceed independently of data file changes, improving concurrency and reducing metadata overhead.
Usage
Apply manifest management when implementing snapshot isolation for data lakes, tracking large numbers of data files across partitioned tables, or enabling time-travel queries. This pattern is essential for maintaining ACID properties in distributed storage systems.
Theoretical Basis
The manifest management pattern operates through hierarchical metadata tracking:
Three-Level Hierarchy
Snapshot
|
+--> Manifest List (manifest-list-{snapshot-id})
|
+--> Manifest File 1 (manifest-{uuid})
| |
| +--> Data File Entry 1: (path, partition, stats, added/deleted)
| +--> Data File Entry 2: ...
| +--> Data File Entry N
|
+--> Manifest File 2
| +--> Data File Entry N+1
| +--> ...
|
+--> Manifest File M
Manifest File Entry Structure
structure ManifestEntry:
kind: enum // ADDED, DELETED
partition: PartitionSpec // Partition values
filePath: string // Storage path
fileSize: long // Bytes
rowCount: long // Number of rows
minKey: bytes // Minimum key (for range pruning)
maxKey: bytes // Maximum key
nullValueCounts: map // Column -> null count
minValues: map // Column -> min value
maxValues: map // Column -> max value
schemaId: int // Schema version
level: int // Compaction level
Atomic Snapshot Update
function commitSnapshot(currentSnapshot, newDataFiles, deletedDataFiles):
// Step 1: Write new manifest files for changes
newManifests = []
if newDataFiles.notEmpty():
addedManifest = writeManifestFile(
newDataFiles.map(file => ManifestEntry(ADDED, file))
)
newManifests.add(addedManifest)
if deletedDataFiles.notEmpty():
deletedManifest = writeManifestFile(
deletedDataFiles.map(file => ManifestEntry(DELETED, file))
)
newManifests.add(deletedManifest)
// Step 2: Collect unchanged manifest files from current snapshot
unchangedManifests = currentSnapshot.manifestList.getManifestFiles()
// Step 3: Write new manifest list
allManifests = unchangedManifests + newManifests
newManifestList = writeManifestList(
snapshotId: currentSnapshot.id + 1,
manifestFiles: allManifests,
totalFiles: computeTotalFiles(allManifests),
totalSize: computeTotalSize(allManifests)
)
// Step 4: Atomic swap - update snapshot pointer
atomicWrite(snapshotPointerFile, newManifestList.path)
return newSnapshot(newManifestList)
Reading Snapshot
function readSnapshot(snapshotId):
manifestListPath = readSnapshotPointer(snapshotId)
manifestList = readManifestList(manifestListPath)
allDataFiles = []
for each manifestFile in manifestList.files:
entries = readManifestFile(manifestFile)
// Build file set by applying additions and deletions
for each entry in entries:
if entry.kind == ADDED:
allDataFiles.add(entry)
else if entry.kind == DELETED:
allDataFiles.remove(entry)
return allDataFiles
Index Manifest Management
structure IndexManifest:
indexType: string // "bloom-filter", "bitmap", "deletion-vector"
indexFiles: list // Index file references
dataFileMapping: map // Data file -> index file
function updateIndexManifest(dataFile, indexFile, indexType):
currentManifest = readIndexManifest(indexType)
newManifest = currentManifest.copy()
newManifest.addMapping(dataFile, indexFile)
writeIndexManifest(indexType, newManifest)
Manifest Compaction
- Over time, manifest lists accumulate many manifest files
- Periodically compact by merging small manifest files
- Eliminate redundant ADDED/DELETED pairs for same files
- Reduce metadata read overhead during snapshot loading