Implementation:Lance format Lance CommitHandler
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Version_Control |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete tool for coordinating concurrent writes and ensuring exactly-once version assignment, provided by the Lance library.
Description
The CommitHandler trait is the pluggable abstraction that governs how Lance datasets resolve manifest locations and commit new versions. It defines five methods: resolve_latest_location, resolve_version_location, list_manifest_locations, commit, and delete. Only commit is required; the others carry default implementations. Implementations range from a simple file-system rename handler (for local or cloud object stores) to a DynamoDB-backed handler for strongly consistent external locking.
The top-level commit_transaction function orchestrates the full commit lifecycle: it loads concurrent transactions, rebases the current transaction if needed, builds a manifest, and delegates the atomic write to CommitHandler::commit(). On conflict it retries with exponential SlotBackoff.
Usage
Use CommitHandler when:
- Implementing a custom commit backend (for example, backed by a database for external coordination).
- Integrating Lance into a system that requires a specific concurrency-control mechanism.
- Extending Lance's storage layer to a new object store that needs special manifest-handling logic.
Use commit_transaction (indirectly, through Dataset mutation methods) whenever writing data to a Lance dataset.
Code Reference
Source Location
- Repository: Lance
- File (trait):
rust/lance-table/src/io/commit.rs - Lines (trait): L496-L594
- File (function):
rust/lance/src/io/commit.rs - Lines (function): L779-L987
Signature
CommitHandler trait:
#[async_trait]
pub trait CommitHandler: Debug + Send + Sync {
async fn resolve_latest_location(
&self,
base_path: &Path,
object_store: &ObjectStore,
) -> Result<ManifestLocation>;
async fn resolve_version_location(
&self,
base_path: &Path,
version: u64,
object_store: &dyn OSObjectStore,
) -> Result<ManifestLocation>;
fn list_manifest_locations<'a>(
&self,
base_path: &Path,
object_store: &'a ObjectStore,
sorted_descending: bool,
) -> BoxStream<'a, Result<ManifestLocation>>;
async fn commit(
&self,
manifest: &mut Manifest,
indices: Option<Vec<IndexMetadata>>,
base_path: &Path,
object_store: &ObjectStore,
manifest_writer: ManifestWriter,
naming_scheme: ManifestNamingScheme,
transaction: Option<Transaction>,
) -> std::result::Result<ManifestLocation, CommitError>;
async fn delete(&self, _base_path: &Path) -> Result<()>;
}
commit_transaction function:
pub(crate) async fn commit_transaction(
dataset: &Dataset,
object_store: &ObjectStore,
commit_handler: &dyn CommitHandler,
transaction: &Transaction,
write_config: &ManifestWriteConfig,
commit_config: &CommitConfig,
manifest_naming_scheme: ManifestNamingScheme,
affected_rows: Option<&RowAddrTreeMap>,
) -> Result<(Manifest, ManifestLocation)>
Import
use lance_table::io::commit::CommitHandler;
// commit_transaction is pub(crate) and used internally by Dataset
I/O Contract
Inputs (CommitHandler::commit)
| Name | Type | Required | Description |
|---|---|---|---|
| manifest | &mut Manifest |
Yes | The manifest to commit; its version field is set to the target version before calling.
|
| indices | Option<Vec<IndexMetadata>> |
No | Optional index metadata to persist alongside the manifest. |
| base_path | &Path |
Yes | Root path of the dataset in the object store. |
| object_store | &ObjectStore |
Yes | The object store used for reading and writing. |
| manifest_writer | ManifestWriter |
Yes | Callback function that serializes the manifest to the object store. |
| naming_scheme | ManifestNamingScheme |
Yes | V1 or V2 naming scheme for manifest file paths. |
| transaction | Option<Transaction> |
No | The transaction associated with this commit, used for conflict detection. |
Outputs (CommitHandler::commit)
| Name | Type | Description |
|---|---|---|
| Ok | ManifestLocation |
Location metadata (version, path, size, e_tag) of the successfully committed manifest. |
| Err | CommitError |
Either CommitConflict (version slot taken) or OtherError (I/O failure).
|
Inputs (commit_transaction)
| Name | Type | Required | Description |
|---|---|---|---|
| dataset | &Dataset |
Yes | The current dataset handle, providing the base manifest and object store. |
| object_store | &ObjectStore |
Yes | Object store configured with write parameters. |
| commit_handler | &dyn CommitHandler |
Yes | The commit handler to use for the atomic commit. |
| transaction | &Transaction |
Yes | The transaction describing the mutation. |
| write_config | &ManifestWriteConfig |
Yes | Configuration for manifest writing (timestamp, storage format). |
| commit_config | &CommitConfig |
Yes | Configuration for commit behavior (number of retries, auto-cleanup). |
| manifest_naming_scheme | ManifestNamingScheme |
Yes | V1 or V2 naming scheme. |
| affected_rows | Option<&RowAddrTreeMap> |
No | Rows affected by this transaction, used for conflict detection during rebase. |
Outputs (commit_transaction)
| Name | Type | Description |
|---|---|---|
| Ok | (Manifest, ManifestLocation) |
The committed manifest and its storage location. |
| Err | Error |
A CommitConflict error if all retries are exhausted, or another I/O error.
|
Usage Examples
use lance::Dataset;
use lance::dataset::WriteParams;
use arrow_array::RecordBatch;
// Every write to a Dataset automatically goes through the commit protocol.
// Appending data creates a new version:
let mut dataset = Dataset::open("s3://bucket/my_dataset").await?;
let batch: RecordBatch = /* ... */;
Dataset::write(
RecordBatchIterator::new(vec![Ok(batch)], schema.clone()),
"s3://bucket/my_dataset",
Some(WriteParams {
mode: WriteMode::Append,
..Default::default()
}),
).await?;
// The dataset now has version N+1.
// The commit handler resolves the latest version:
let latest = dataset.latest_version_id().await?;