Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Lance format Lance InsertBuilder

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Columnar_Storage
Last Updated 2026-02-08 19:00 GMT

Overview

Concrete tool for ingesting data into an existing or new Lance dataset using a builder pattern, provided by the Lance library.

Description

InsertBuilder is the core write orchestrator for Lance datasets. It accepts a WriteDestination (either a URI for new datasets or an Arc<Dataset> for existing ones) and optional WriteParams to control physical layout and write mode. It supports two execution paths:

  • execute(data): Accepts a Vec<RecordBatch>, writes fragment files, and commits the transaction in one step.
  • execute_stream(source): Accepts any StreamingWriteSource (including Box<dyn RecordBatchReader>), writes fragments from the stream, and commits.
  • execute_uncommitted(data): Writes fragment files but returns a Transaction without committing, enabling distributed or deferred commits via CommitBuilder.

The builder delegates to the two-phase write-then-commit architecture. The WriteMode in WriteParams determines whether the operation creates, appends, or overwrites.

Usage

Use InsertBuilder when:

  • Appending new data to an existing Lance dataset.
  • Implementing distributed write pipelines where uncommitted fragments are combined.
  • Controlling the write mode (Create, Append, Overwrite) explicitly.

Code Reference

Source Location

  • Repository: Lance
  • File: rust/lance/src/dataset/write/insert.rs
  • Lines: L46-L116

Signature

pub struct InsertBuilder<'a> {
    dest: WriteDestination<'a>,
    params: Option<&'a WriteParams>,
}

impl<'a> InsertBuilder<'a> {
    pub fn new(dest: impl Into<WriteDestination<'a>>) -> Self;
    pub fn with_params(mut self, params: &'a WriteParams) -> Self;
    pub async fn execute(&self, data: Vec<RecordBatch>) -> Result<Dataset>;
    pub async fn execute_stream(&self, source: impl StreamingWriteSource) -> Result<Dataset>;
    pub async fn execute_uncommitted(&self, data: Vec<RecordBatch>) -> Result<Transaction>;
}

Supporting Types

// rust/lance/src/dataset/write.rs:L108-L116
pub enum WriteMode {
    /// Create a new dataset. Expect the dataset does not exist.
    Create,
    /// Append to an existing dataset.
    Append,
    /// Overwrite a dataset as a new version, or create new dataset if not exist.
    Overwrite,
}

Import

use lance::dataset::{InsertBuilder, WriteParams, WriteMode, CommitBuilder};

I/O Contract

Inputs

Name Type Required Description
dest impl Into<WriteDestination<'a>> Yes The destination: a URI string for new datasets or Arc<Dataset> for existing datasets.
params &'a WriteParams No Optional write parameters. Controls mode (Create/Append/Overwrite), file layout, storage, and commit behavior.
data (execute) Vec<RecordBatch> Yes (for execute) A vector of Arrow RecordBatches to write.
source (execute_stream) impl StreamingWriteSource Yes (for execute_stream) A streaming data source such as Box<dyn RecordBatchReader + Send>.

Outputs

Name Type Description
Result (execute/execute_stream) Result<Dataset> The updated Dataset with the new version committed.
Result (execute_uncommitted) Result<Transaction> An uncommitted Transaction that can be passed to CommitBuilder.

Usage Examples

Append Data

use std::sync::Arc;
use arrow_array::RecordBatch;
use lance::dataset::{Dataset, InsertBuilder, WriteParams, WriteMode};

async fn append_data(dataset: Arc<Dataset>, batches: Vec<RecordBatch>) -> lance::Result<Dataset> {
    let params = WriteParams {
        mode: WriteMode::Append,
        ..Default::default()
    };
    InsertBuilder::new(dataset)
        .with_params(&params)
        .execute(batches)
        .await
}

Two-Phase Distributed Write

use std::sync::Arc;
use arrow_array::RecordBatch;
use lance::dataset::{Dataset, InsertBuilder, CommitBuilder, WriteParams, WriteMode};

async fn distributed_write(
    dataset: Arc<Dataset>,
    data: Vec<RecordBatch>,
) -> lance::Result<()> {
    let params = WriteParams {
        mode: WriteMode::Append,
        ..Default::default()
    };

    // Phase 1: Write fragments without committing
    let transaction = InsertBuilder::new(dataset.clone())
        .with_params(&params)
        .execute_uncommitted(data)
        .await?;

    // Phase 2: Commit the transaction
    CommitBuilder::new(dataset)
        .execute(transaction)
        .await?;

    Ok(())
}

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment