Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Lance format Lance Compaction Commit

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Storage_Optimization
Last Updated 2026-02-08 19:00 GMT

Overview

Compaction commit is the final phase of the compaction workflow where the results of fragment rewriting are atomically applied to the dataset, replacing old fragments with new ones and optionally remapping indices.

Description

After compaction tasks have been executed and each has produced a RewriteResult, the results must be committed to make the changes visible. The commit phase aggregates all rewrite results, constructs the appropriate index remapping, and issues a single Operation::Rewrite transaction that atomically swaps old fragments for new ones in the dataset manifest.

The commit process involves several steps:

  1. Aggregation: Iterate through all completed RewriteResult objects, accumulating CompactionMetrics and building RewriteGroup structures that pair old fragments with their replacements.
  1. Index remapping: If the dataset does not use stable row IDs and index remap is not deferred, the accumulated row ID map (old to new) is passed to an IndexRemapper which updates each affected index. The remapper produces RewrittenIndex entries that are included in the transaction.
  1. Deferred remap support: If defer_index_remap is enabled, instead of remapping immediately, the commit builds a fragment reuse index that records the relationship between old and new fragments. This allows index remapping to happen later, reducing the critical path of compaction.
  1. Transaction commit: A Transaction with Operation::Rewrite is created containing the rewrite groups, rewritten indices, and optional fragment reuse index. This transaction is applied to the dataset, creating a new version.

The commit is designed to be partial-safe: it is not required that all planned tasks complete successfully. Successfully completed tasks can be committed while failed tasks are simply omitted. However, once some tasks have been committed, the remaining tasks from the same plan become invalid and should be discarded.

Usage

Use commit_compaction:

  • As the final step after collecting all RewriteResult objects from distributed workers.
  • When you want to commit a partial set of results (e.g., only tasks that completed before a deadline).
  • Through the convenience function compact_files() which handles plan, execute, and commit in one call.

Theoretical Basis

Compaction commit implements an optimistic concurrency control pattern:

for each completed_task:
    aggregate metrics
    build RewriteGroup(old_fragments, new_fragments)
    if needs_remapping:
        collect row_id_map entries
    else if defer_remap:
        collect frag_reuse_groups

if needs_remapping:
    rewritten_indices = index_remapper.remap(row_id_map, affected_fragment_ids)

transaction = Operation::Rewrite {
    groups: rewrite_groups,
    rewritten_indices: rewritten_indices,
    frag_reuse_index: optional_frag_reuse_index,
}
dataset.apply_commit(transaction)

Key properties:

  • Atomicity: The entire set of fragment replacements is committed in a single transaction. Either all replacements take effect or none do.
  • Conflict detection: If another writer has modified any of the same fragments between the read version and the commit, the transaction will fail with a conflict error, preventing data corruption.
  • Idempotent metrics: The returned CompactionMetrics accurately reflects the sum of all committed task metrics, regardless of partial completion.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment