Implementation:Lance format Lance Dataset Delete
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Columnar_Storage |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete tool for deleting rows from a Lance dataset by SQL predicate, provided by the Lance library.
Description
Dataset::delete removes rows matching a SQL predicate string from the dataset. It operates by writing deletion vectors (bitmaps of deleted row offsets) to the affected fragments and committing a new dataset version. The underlying data is not physically removed until compaction and cleanup are performed.
DeleteBuilder provides a more configurable interface with explicit control over conflict retry count and timeout. Both APIs produce the same result: a new dataset version with the matching rows marked as deleted.
The convenience method Dataset::truncate_table deletes all rows by calling delete("true").
Usage
Use Dataset::delete for simple predicate-based deletion. Use DeleteBuilder when you need to customize retry behavior for high-contention workloads.
Code Reference
Source Location
- Repository: Lance
- Files:
rust/lance/src/dataset.rs(Dataset::delete L1520-L1523)rust/lance/src/dataset/write/delete.rs(DeleteBuilder L96-L140)
Signature
// Dataset convenience method
impl Dataset {
pub async fn delete(&mut self, predicate: &str) -> Result<()>;
pub async fn truncate_table(&mut self) -> Result<()>;
}
// DeleteBuilder for advanced configuration
pub struct DeleteBuilder {
dataset: Arc<Dataset>,
predicate: String,
conflict_retries: u32, // default: 10
retry_timeout: Duration, // default: 30s
}
impl DeleteBuilder {
pub fn new(dataset: Arc<Dataset>, predicate: impl Into<String>) -> Self;
pub fn conflict_retries(mut self, retries: u32) -> Self;
pub fn retry_timeout(mut self, timeout: Duration) -> Self;
pub async fn execute(self) -> Result<Arc<Dataset>>;
}
Import
use lance::dataset::Dataset;
use lance::dataset::write::delete::DeleteBuilder;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| &mut self (delete) | &mut Dataset |
Yes | The dataset to delete from. Mutably borrowed to update internal state after commit. |
| predicate | &str |
Yes | SQL filter string identifying rows to delete (e.g., "id > 1000", "status = 'expired'"). Use "true" to delete all rows.
|
| dataset (DeleteBuilder) | Arc<Dataset> |
Yes | The dataset to delete from (used with DeleteBuilder). |
| conflict_retries | u32 |
No | Number of commit retries on conflict (default 10). |
| retry_timeout | Duration |
No | Total timeout for all retries (default 30s). |
Outputs
| Name | Type | Description |
|---|---|---|
| Result (delete) | Result<()> |
Success or error. The dataset is mutated in place to reflect the new version. |
| Result (DeleteBuilder) | Result<Arc<Dataset>> |
The updated dataset after deletion. |
Usage Examples
Basic Delete
use lance::dataset::Dataset;
async fn delete_expired(uri: &str) -> lance::Result<()> {
let mut dataset = Dataset::open(uri).await?;
dataset.delete("expiry_date < '2025-01-01'").await?;
println!("Rows after delete: {}", dataset.count_rows(None).await?);
Ok(())
}
DeleteBuilder with Custom Retry
use std::sync::Arc;
use std::time::Duration;
use lance::dataset::Dataset;
use lance::dataset::write::delete::DeleteBuilder;
async fn delete_with_retry(dataset: Arc<Dataset>) -> lance::Result<()> {
let updated = DeleteBuilder::new(dataset, "category = 'obsolete'")
.conflict_retries(5)
.retry_timeout(Duration::from_secs(60))
.execute()
.await?;
println!("Delete committed as v{}", updated.version().version);
Ok(())
}
Truncate All Rows
use lance::dataset::Dataset;
async fn truncate(uri: &str) -> lance::Result<()> {
let mut dataset = Dataset::open(uri).await?;
dataset.truncate_table().await?;
assert_eq!(dataset.count_rows(None).await?, 0);
Ok(())
}