Implementation:Lance format Lance Plan Compaction
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Storage_Optimization |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete tool for analyzing a Lance dataset and producing a compaction plan, provided by the Lance library.
Description
The plan_compaction function creates a DefaultCompactionPlanner from the supplied options and invokes its plan method. The planner iterates every fragment in the dataset manifest, collects metrics (physical row count and deletion count) for each fragment, classifies fragments as CompactItself (high deletion ratio) or CompactWithNeighbors (fewer rows than target), and groups adjacent candidates into bins. Bins are further split at the target_rows_per_fragment boundary and emitted as TaskData entries in the resulting CompactionPlan.
The plan also loads index fragment bitmaps to ensure that indexed and unindexed fragments are never merged together, preserving index integrity.
Usage
Use plan_compaction as the first step of a distributed compaction workflow. The returned CompactionPlan can be split into individual CompactionTask objects via compaction_tasks(), serialized, and dispatched to worker nodes for parallel execution.
Code Reference
Source Location
- Repository: Lance
- File:
rust/lance/src/dataset/optimize.rs - Lines: L851-L857 (function), L123-L179 (CompactionOptions), L412-L524 (DefaultCompactionPlanner::plan)
Signature
pub async fn plan_compaction(
dataset: &Dataset,
options: &CompactionOptions,
) -> Result<CompactionPlan>
Import
use lance::dataset::optimize::{plan_compaction, CompactionOptions, CompactionPlan};
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dataset | &Dataset | Yes | Reference to the Lance dataset whose fragments will be analyzed. |
| options | &CompactionOptions | Yes | Configuration controlling compaction behavior (see fields below). |
CompactionOptions fields:
| Field | Type | Default | Description |
|---|---|---|---|
| target_rows_per_fragment | usize | 1,048,576 | Target number of rows per fragment. Fragments below this threshold are candidates for merging. |
| max_rows_per_group | usize | 1,024 | Maximum rows per row group when rewriting. |
| max_bytes_per_file | Option<usize> | None | Maximum bytes per output file. Uses WriteParams default if None. |
| materialize_deletions | bool | true | Whether to compact fragments with high deletion ratios. |
| materialize_deletions_threshold | f32 | 0.1 | Fraction of deleted rows above which a fragment is compacted. Setting to 0.0 materializes all deletions; above 1.0 disables. |
| num_threads | Option<usize> | None | Number of parallel compaction threads. Defaults to compute-intensive CPU count. Not used when tasks are dispatched manually. |
| batch_size | Option<usize> | None | Batch size for scanning input fragments. Uses Scanner default if None. |
| defer_index_remap | bool | false | If true, defer index remapping to a later step instead of performing it during compaction. |
| enable_binary_copy | bool | false | Enable binary copy optimization for faster compaction when eligible. |
| enable_binary_copy_force | bool | false | Fail compaction if binary copy is not supported. |
| binary_copy_read_batch_bytes | Option<usize> | 16 MB | Read batch size in bytes for binary copy operations. |
Outputs
| Name | Type | Description |
|---|---|---|
| CompactionPlan | struct | Contains tasks: Vec<TaskData> (groups of fragments to compact), read_version: u64 (dataset version used for planning), and options: CompactionOptions.
|
Usage Examples
use lance::Dataset;
use lance::dataset::optimize::{plan_compaction, CompactionOptions};
async fn example(dataset: &Dataset) -> lance::Result<()> {
let options = CompactionOptions {
target_rows_per_fragment: 512 * 1024,
materialize_deletions: true,
materialize_deletions_threshold: 0.05,
..Default::default()
};
let plan = plan_compaction(dataset, &options).await?;
println!("Number of compaction tasks: {}", plan.tasks.len());
println!("Read version: {}", plan.read_version);
// Distribute tasks to workers
for task in plan.compaction_tasks() {
// Serialize and send task to a worker node
let serialized = serde_json::to_string(&task).unwrap();
println!("Task: {}", serialized);
}
Ok(())
}