Principle:Lance format Lance Compaction Commit
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Storage_Optimization |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Compaction commit is the final phase of the compaction workflow where the results of fragment rewriting are atomically applied to the dataset, replacing old fragments with new ones and optionally remapping indices.
Description
After compaction tasks have been executed and each has produced a RewriteResult, the results must be committed to make the changes visible. The commit phase aggregates all rewrite results, constructs the appropriate index remapping, and issues a single Operation::Rewrite transaction that atomically swaps old fragments for new ones in the dataset manifest.
The commit process involves several steps:
- Aggregation: Iterate through all completed
RewriteResultobjects, accumulatingCompactionMetricsand buildingRewriteGroupstructures that pair old fragments with their replacements.
- Index remapping: If the dataset does not use stable row IDs and index remap is not deferred, the accumulated row ID map (old to new) is passed to an
IndexRemapperwhich updates each affected index. The remapper producesRewrittenIndexentries that are included in the transaction.
- Deferred remap support: If
defer_index_remapis enabled, instead of remapping immediately, the commit builds a fragment reuse index that records the relationship between old and new fragments. This allows index remapping to happen later, reducing the critical path of compaction.
- Transaction commit: A
TransactionwithOperation::Rewriteis created containing the rewrite groups, rewritten indices, and optional fragment reuse index. This transaction is applied to the dataset, creating a new version.
The commit is designed to be partial-safe: it is not required that all planned tasks complete successfully. Successfully completed tasks can be committed while failed tasks are simply omitted. However, once some tasks have been committed, the remaining tasks from the same plan become invalid and should be discarded.
Usage
Use commit_compaction:
- As the final step after collecting all
RewriteResultobjects from distributed workers. - When you want to commit a partial set of results (e.g., only tasks that completed before a deadline).
- Through the convenience function
compact_files()which handles plan, execute, and commit in one call.
Theoretical Basis
Compaction commit implements an optimistic concurrency control pattern:
for each completed_task:
aggregate metrics
build RewriteGroup(old_fragments, new_fragments)
if needs_remapping:
collect row_id_map entries
else if defer_remap:
collect frag_reuse_groups
if needs_remapping:
rewritten_indices = index_remapper.remap(row_id_map, affected_fragment_ids)
transaction = Operation::Rewrite {
groups: rewrite_groups,
rewritten_indices: rewritten_indices,
frag_reuse_index: optional_frag_reuse_index,
}
dataset.apply_commit(transaction)
Key properties:
- Atomicity: The entire set of fragment replacements is committed in a single transaction. Either all replacements take effect or none do.
- Conflict detection: If another writer has modified any of the same fragments between the read version and the commit, the transaction will fail with a conflict error, preventing data corruption.
- Idempotent metrics: The returned
CompactionMetricsaccurately reflects the sum of all committed task metrics, regardless of partial completion.