Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Lance format Lance CommitHandler

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Version_Control
Last Updated 2026-02-08 19:00 GMT

Overview

Concrete tool for coordinating concurrent writes and ensuring exactly-once version assignment, provided by the Lance library.

Description

The CommitHandler trait is the pluggable abstraction that governs how Lance datasets resolve manifest locations and commit new versions. It defines five methods: resolve_latest_location, resolve_version_location, list_manifest_locations, commit, and delete. Only commit is required; the others carry default implementations. Implementations range from a simple file-system rename handler (for local or cloud object stores) to a DynamoDB-backed handler for strongly consistent external locking.

The top-level commit_transaction function orchestrates the full commit lifecycle: it loads concurrent transactions, rebases the current transaction if needed, builds a manifest, and delegates the atomic write to CommitHandler::commit(). On conflict it retries with exponential SlotBackoff.

Usage

Use CommitHandler when:

  • Implementing a custom commit backend (for example, backed by a database for external coordination).
  • Integrating Lance into a system that requires a specific concurrency-control mechanism.
  • Extending Lance's storage layer to a new object store that needs special manifest-handling logic.

Use commit_transaction (indirectly, through Dataset mutation methods) whenever writing data to a Lance dataset.

Code Reference

Source Location

  • Repository: Lance
  • File (trait): rust/lance-table/src/io/commit.rs
  • Lines (trait): L496-L594
  • File (function): rust/lance/src/io/commit.rs
  • Lines (function): L779-L987

Signature

CommitHandler trait:

#[async_trait]
pub trait CommitHandler: Debug + Send + Sync {
    async fn resolve_latest_location(
        &self,
        base_path: &Path,
        object_store: &ObjectStore,
    ) -> Result<ManifestLocation>;

    async fn resolve_version_location(
        &self,
        base_path: &Path,
        version: u64,
        object_store: &dyn OSObjectStore,
    ) -> Result<ManifestLocation>;

    fn list_manifest_locations<'a>(
        &self,
        base_path: &Path,
        object_store: &'a ObjectStore,
        sorted_descending: bool,
    ) -> BoxStream<'a, Result<ManifestLocation>>;

    async fn commit(
        &self,
        manifest: &mut Manifest,
        indices: Option<Vec<IndexMetadata>>,
        base_path: &Path,
        object_store: &ObjectStore,
        manifest_writer: ManifestWriter,
        naming_scheme: ManifestNamingScheme,
        transaction: Option<Transaction>,
    ) -> std::result::Result<ManifestLocation, CommitError>;

    async fn delete(&self, _base_path: &Path) -> Result<()>;
}

commit_transaction function:

pub(crate) async fn commit_transaction(
    dataset: &Dataset,
    object_store: &ObjectStore,
    commit_handler: &dyn CommitHandler,
    transaction: &Transaction,
    write_config: &ManifestWriteConfig,
    commit_config: &CommitConfig,
    manifest_naming_scheme: ManifestNamingScheme,
    affected_rows: Option<&RowAddrTreeMap>,
) -> Result<(Manifest, ManifestLocation)>

Import

use lance_table::io::commit::CommitHandler;
// commit_transaction is pub(crate) and used internally by Dataset

I/O Contract

Inputs (CommitHandler::commit)

Name Type Required Description
manifest &mut Manifest Yes The manifest to commit; its version field is set to the target version before calling.
indices Option<Vec<IndexMetadata>> No Optional index metadata to persist alongside the manifest.
base_path &Path Yes Root path of the dataset in the object store.
object_store &ObjectStore Yes The object store used for reading and writing.
manifest_writer ManifestWriter Yes Callback function that serializes the manifest to the object store.
naming_scheme ManifestNamingScheme Yes V1 or V2 naming scheme for manifest file paths.
transaction Option<Transaction> No The transaction associated with this commit, used for conflict detection.

Outputs (CommitHandler::commit)

Name Type Description
Ok ManifestLocation Location metadata (version, path, size, e_tag) of the successfully committed manifest.
Err CommitError Either CommitConflict (version slot taken) or OtherError (I/O failure).

Inputs (commit_transaction)

Name Type Required Description
dataset &Dataset Yes The current dataset handle, providing the base manifest and object store.
object_store &ObjectStore Yes Object store configured with write parameters.
commit_handler &dyn CommitHandler Yes The commit handler to use for the atomic commit.
transaction &Transaction Yes The transaction describing the mutation.
write_config &ManifestWriteConfig Yes Configuration for manifest writing (timestamp, storage format).
commit_config &CommitConfig Yes Configuration for commit behavior (number of retries, auto-cleanup).
manifest_naming_scheme ManifestNamingScheme Yes V1 or V2 naming scheme.
affected_rows Option<&RowAddrTreeMap> No Rows affected by this transaction, used for conflict detection during rebase.

Outputs (commit_transaction)

Name Type Description
Ok (Manifest, ManifestLocation) The committed manifest and its storage location.
Err Error A CommitConflict error if all retries are exhausted, or another I/O error.

Usage Examples

use lance::Dataset;
use lance::dataset::WriteParams;
use arrow_array::RecordBatch;

// Every write to a Dataset automatically goes through the commit protocol.
// Appending data creates a new version:
let mut dataset = Dataset::open("s3://bucket/my_dataset").await?;
let batch: RecordBatch = /* ... */;
Dataset::write(
    RecordBatchIterator::new(vec![Ok(batch)], schema.clone()),
    "s3://bucket/my_dataset",
    Some(WriteParams {
        mode: WriteMode::Append,
        ..Default::default()
    }),
).await?;
// The dataset now has version N+1.

// The commit handler resolves the latest version:
let latest = dataset.latest_version_id().await?;

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment