Implementation:Lance format Lance InsertBuilder
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Columnar_Storage |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete tool for ingesting data into an existing or new Lance dataset using a builder pattern, provided by the Lance library.
Description
InsertBuilder is the core write orchestrator for Lance datasets. It accepts a WriteDestination (either a URI for new datasets or an Arc<Dataset> for existing ones) and optional WriteParams to control physical layout and write mode. It supports two execution paths:
execute(data): Accepts aVec<RecordBatch>, writes fragment files, and commits the transaction in one step.execute_stream(source): Accepts anyStreamingWriteSource(includingBox<dyn RecordBatchReader>), writes fragments from the stream, and commits.execute_uncommitted(data): Writes fragment files but returns aTransactionwithout committing, enabling distributed or deferred commits viaCommitBuilder.
The builder delegates to the two-phase write-then-commit architecture. The WriteMode in WriteParams determines whether the operation creates, appends, or overwrites.
Usage
Use InsertBuilder when:
- Appending new data to an existing Lance dataset.
- Implementing distributed write pipelines where uncommitted fragments are combined.
- Controlling the write mode (Create, Append, Overwrite) explicitly.
Code Reference
Source Location
- Repository: Lance
- File:
rust/lance/src/dataset/write/insert.rs - Lines: L46-L116
Signature
pub struct InsertBuilder<'a> {
dest: WriteDestination<'a>,
params: Option<&'a WriteParams>,
}
impl<'a> InsertBuilder<'a> {
pub fn new(dest: impl Into<WriteDestination<'a>>) -> Self;
pub fn with_params(mut self, params: &'a WriteParams) -> Self;
pub async fn execute(&self, data: Vec<RecordBatch>) -> Result<Dataset>;
pub async fn execute_stream(&self, source: impl StreamingWriteSource) -> Result<Dataset>;
pub async fn execute_uncommitted(&self, data: Vec<RecordBatch>) -> Result<Transaction>;
}
Supporting Types
// rust/lance/src/dataset/write.rs:L108-L116
pub enum WriteMode {
/// Create a new dataset. Expect the dataset does not exist.
Create,
/// Append to an existing dataset.
Append,
/// Overwrite a dataset as a new version, or create new dataset if not exist.
Overwrite,
}
Import
use lance::dataset::{InsertBuilder, WriteParams, WriteMode, CommitBuilder};
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| dest | impl Into<WriteDestination<'a>> |
Yes | The destination: a URI string for new datasets or Arc<Dataset> for existing datasets.
|
| params | &'a WriteParams |
No | Optional write parameters. Controls mode (Create/Append/Overwrite), file layout, storage, and commit behavior. |
| data (execute) | Vec<RecordBatch> |
Yes (for execute) | A vector of Arrow RecordBatches to write. |
| source (execute_stream) | impl StreamingWriteSource |
Yes (for execute_stream) | A streaming data source such as Box<dyn RecordBatchReader + Send>.
|
Outputs
| Name | Type | Description |
|---|---|---|
| Result (execute/execute_stream) | Result<Dataset> |
The updated Dataset with the new version committed. |
| Result (execute_uncommitted) | Result<Transaction> |
An uncommitted Transaction that can be passed to CommitBuilder.
|
Usage Examples
Append Data
use std::sync::Arc;
use arrow_array::RecordBatch;
use lance::dataset::{Dataset, InsertBuilder, WriteParams, WriteMode};
async fn append_data(dataset: Arc<Dataset>, batches: Vec<RecordBatch>) -> lance::Result<Dataset> {
let params = WriteParams {
mode: WriteMode::Append,
..Default::default()
};
InsertBuilder::new(dataset)
.with_params(¶ms)
.execute(batches)
.await
}
Two-Phase Distributed Write
use std::sync::Arc;
use arrow_array::RecordBatch;
use lance::dataset::{Dataset, InsertBuilder, CommitBuilder, WriteParams, WriteMode};
async fn distributed_write(
dataset: Arc<Dataset>,
data: Vec<RecordBatch>,
) -> lance::Result<()> {
let params = WriteParams {
mode: WriteMode::Append,
..Default::default()
};
// Phase 1: Write fragments without committing
let transaction = InsertBuilder::new(dataset.clone())
.with_params(¶ms)
.execute_uncommitted(data)
.await?;
// Phase 2: Commit the transaction
CommitBuilder::new(dataset)
.execute(transaction)
.await?;
Ok(())
}
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment