Principle:Apache Paimon Manifest Management

Knowledge Sources	Apache_Paimon
Domains	Metadata, Storage
Last Updated	2026-02-08 00:00 GMT

Overview

Manifest management provides atomic, versioned tracking of all data files in a table through hierarchical metadata structures that enable snapshot isolation and efficient file discovery.

Description

The manifest management principle solves the challenge of maintaining consistent metadata for tables that may contain millions of data files distributed across thousands of partitions. Rather than storing a flat list of all files, the system organizes metadata hierarchically: individual data file entries are grouped into manifest files, and collections of manifest files are referenced by manifest lists. This multi-level structure enables incremental metadata updates, where only changed portions need to be rewritten rather than the entire metadata set.

Each manifest file contains entries describing a subset of data files, including their physical locations, size, row counts, statistics, and partition information. Manifest files are immutable once written, allowing concurrent readers to access consistent snapshots without locking. When data files are added or removed, the system writes new manifest files containing only the changes, then creates a new manifest list that references both unchanged and newly written manifest files. This copy-on-write approach provides atomic updates: a new table snapshot becomes visible only when the manifest list is successfully written.

The manifest list serves as the root of the metadata hierarchy, containing references to all manifest files that comprise a complete table snapshot. It stores aggregate statistics like total file count and data size, enabling quick metadata queries without reading individual manifest files. Index manifest files extend this pattern to track auxiliary index structures, maintaining separate metadata hierarchies for different index types. This separation allows index maintenance to proceed independently of data file changes, improving concurrency and reducing metadata overhead.

Usage

Apply manifest management when implementing snapshot isolation for data lakes, tracking large numbers of data files across partitioned tables, or enabling time-travel queries. This pattern is essential for maintaining ACID properties in distributed storage systems.

Theoretical Basis

The manifest management pattern operates through hierarchical metadata tracking:

Three-Level Hierarchy

Snapshot
  |
  +--> Manifest List (manifest-list-{snapshot-id})
         |
         +--> Manifest File 1 (manifest-{uuid})
         |      |
         |      +--> Data File Entry 1: (path, partition, stats, added/deleted)
         |      +--> Data File Entry 2: ...
         |      +--> Data File Entry N
         |
         +--> Manifest File 2
         |      +--> Data File Entry N+1
         |      +--> ...
         |
         +--> Manifest File M

Manifest File Entry Structure

structure ManifestEntry:
    kind: enum                  // ADDED, DELETED
    partition: PartitionSpec    // Partition values
    filePath: string            // Storage path
    fileSize: long              // Bytes
    rowCount: long              // Number of rows
    minKey: bytes               // Minimum key (for range pruning)
    maxKey: bytes               // Maximum key
    nullValueCounts: map        // Column -> null count
    minValues: map              // Column -> min value
    maxValues: map              // Column -> max value
    schemaId: int               // Schema version
    level: int                  // Compaction level

Atomic Snapshot Update

function commitSnapshot(currentSnapshot, newDataFiles, deletedDataFiles):
    // Step 1: Write new manifest files for changes
    newManifests = []

    if newDataFiles.notEmpty():
        addedManifest = writeManifestFile(
            newDataFiles.map(file => ManifestEntry(ADDED, file))
        )
        newManifests.add(addedManifest)

    if deletedDataFiles.notEmpty():
        deletedManifest = writeManifestFile(
            deletedDataFiles.map(file => ManifestEntry(DELETED, file))
        )
        newManifests.add(deletedManifest)

    // Step 2: Collect unchanged manifest files from current snapshot
    unchangedManifests = currentSnapshot.manifestList.getManifestFiles()

    // Step 3: Write new manifest list
    allManifests = unchangedManifests + newManifests
    newManifestList = writeManifestList(
        snapshotId: currentSnapshot.id + 1,
        manifestFiles: allManifests,
        totalFiles: computeTotalFiles(allManifests),
        totalSize: computeTotalSize(allManifests)
    )

    // Step 4: Atomic swap - update snapshot pointer
    atomicWrite(snapshotPointerFile, newManifestList.path)

    return newSnapshot(newManifestList)

Reading Snapshot

function readSnapshot(snapshotId):
    manifestListPath = readSnapshotPointer(snapshotId)
    manifestList = readManifestList(manifestListPath)

    allDataFiles = []

    for each manifestFile in manifestList.files:
        entries = readManifestFile(manifestFile)

        // Build file set by applying additions and deletions
        for each entry in entries:
            if entry.kind == ADDED:
                allDataFiles.add(entry)
            else if entry.kind == DELETED:
                allDataFiles.remove(entry)

    return allDataFiles

Index Manifest Management

structure IndexManifest:
    indexType: string           // "bloom-filter", "bitmap", "deletion-vector"
    indexFiles: list            // Index file references
    dataFileMapping: map        // Data file -> index file

function updateIndexManifest(dataFile, indexFile, indexType):
    currentManifest = readIndexManifest(indexType)

    newManifest = currentManifest.copy()
    newManifest.addMapping(dataFile, indexFile)

    writeIndexManifest(indexType, newManifest)

Manifest Compaction

Over time, manifest lists accumulate many manifest files
Periodically compact by merging small manifest files
Eliminate redundant ADDED/DELETED pairs for same files
Reduce metadata read overhead during snapshot loading

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment