Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Lance format Lance Plan Compaction

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Storage_Optimization
Last Updated 2026-02-08 19:00 GMT

Overview

Concrete tool for analyzing a Lance dataset and producing a compaction plan, provided by the Lance library.

Description

The plan_compaction function creates a DefaultCompactionPlanner from the supplied options and invokes its plan method. The planner iterates every fragment in the dataset manifest, collects metrics (physical row count and deletion count) for each fragment, classifies fragments as CompactItself (high deletion ratio) or CompactWithNeighbors (fewer rows than target), and groups adjacent candidates into bins. Bins are further split at the target_rows_per_fragment boundary and emitted as TaskData entries in the resulting CompactionPlan.

The plan also loads index fragment bitmaps to ensure that indexed and unindexed fragments are never merged together, preserving index integrity.

Usage

Use plan_compaction as the first step of a distributed compaction workflow. The returned CompactionPlan can be split into individual CompactionTask objects via compaction_tasks(), serialized, and dispatched to worker nodes for parallel execution.

Code Reference

Source Location

  • Repository: Lance
  • File: rust/lance/src/dataset/optimize.rs
  • Lines: L851-L857 (function), L123-L179 (CompactionOptions), L412-L524 (DefaultCompactionPlanner::plan)

Signature

pub async fn plan_compaction(
    dataset: &Dataset,
    options: &CompactionOptions,
) -> Result<CompactionPlan>

Import

use lance::dataset::optimize::{plan_compaction, CompactionOptions, CompactionPlan};

I/O Contract

Inputs

Name Type Required Description
dataset &Dataset Yes Reference to the Lance dataset whose fragments will be analyzed.
options &CompactionOptions Yes Configuration controlling compaction behavior (see fields below).

CompactionOptions fields:

Field Type Default Description
target_rows_per_fragment usize 1,048,576 Target number of rows per fragment. Fragments below this threshold are candidates for merging.
max_rows_per_group usize 1,024 Maximum rows per row group when rewriting.
max_bytes_per_file Option<usize> None Maximum bytes per output file. Uses WriteParams default if None.
materialize_deletions bool true Whether to compact fragments with high deletion ratios.
materialize_deletions_threshold f32 0.1 Fraction of deleted rows above which a fragment is compacted. Setting to 0.0 materializes all deletions; above 1.0 disables.
num_threads Option<usize> None Number of parallel compaction threads. Defaults to compute-intensive CPU count. Not used when tasks are dispatched manually.
batch_size Option<usize> None Batch size for scanning input fragments. Uses Scanner default if None.
defer_index_remap bool false If true, defer index remapping to a later step instead of performing it during compaction.
enable_binary_copy bool false Enable binary copy optimization for faster compaction when eligible.
enable_binary_copy_force bool false Fail compaction if binary copy is not supported.
binary_copy_read_batch_bytes Option<usize> 16 MB Read batch size in bytes for binary copy operations.

Outputs

Name Type Description
CompactionPlan struct Contains tasks: Vec<TaskData> (groups of fragments to compact), read_version: u64 (dataset version used for planning), and options: CompactionOptions.

Usage Examples

use lance::Dataset;
use lance::dataset::optimize::{plan_compaction, CompactionOptions};

async fn example(dataset: &Dataset) -> lance::Result<()> {
    let options = CompactionOptions {
        target_rows_per_fragment: 512 * 1024,
        materialize_deletions: true,
        materialize_deletions_threshold: 0.05,
        ..Default::default()
    };

    let plan = plan_compaction(dataset, &options).await?;

    println!("Number of compaction tasks: {}", plan.tasks.len());
    println!("Read version: {}", plan.read_version);

    // Distribute tasks to workers
    for task in plan.compaction_tasks() {
        // Serialize and send task to a worker node
        let serialized = serde_json::to_string(&task).unwrap();
        println!("Task: {}", serialized);
    }

    Ok(())
}

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment