Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance Dataset Delete

From Leeroopedia
Revision as of 15:26, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Lance_format_Lance_Dataset_Delete.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data_Engineering, Columnar_Storage
Last Updated 2026-02-08 19:00 GMT

Overview

Concrete tool for deleting rows from a Lance dataset by SQL predicate, provided by the Lance library.

Description

Dataset::delete removes rows matching a SQL predicate string from the dataset. It operates by writing deletion vectors (bitmaps of deleted row offsets) to the affected fragments and committing a new dataset version. The underlying data is not physically removed until compaction and cleanup are performed.

DeleteBuilder provides a more configurable interface with explicit control over conflict retry count and timeout. Both APIs produce the same result: a new dataset version with the matching rows marked as deleted.

The convenience method Dataset::truncate_table deletes all rows by calling delete("true").

Usage

Use Dataset::delete for simple predicate-based deletion. Use DeleteBuilder when you need to customize retry behavior for high-contention workloads.

Code Reference

Source Location

  • Repository: Lance
  • Files:
    • rust/lance/src/dataset.rs (Dataset::delete L1520-L1523)
    • rust/lance/src/dataset/write/delete.rs (DeleteBuilder L96-L140)

Signature

// Dataset convenience method
impl Dataset {
    pub async fn delete(&mut self, predicate: &str) -> Result<()>;
    pub async fn truncate_table(&mut self) -> Result<()>;
}
// DeleteBuilder for advanced configuration
pub struct DeleteBuilder {
    dataset: Arc<Dataset>,
    predicate: String,
    conflict_retries: u32,   // default: 10
    retry_timeout: Duration, // default: 30s
}

impl DeleteBuilder {
    pub fn new(dataset: Arc<Dataset>, predicate: impl Into<String>) -> Self;
    pub fn conflict_retries(mut self, retries: u32) -> Self;
    pub fn retry_timeout(mut self, timeout: Duration) -> Self;
    pub async fn execute(self) -> Result<Arc<Dataset>>;
}

Import

use lance::dataset::Dataset;
use lance::dataset::write::delete::DeleteBuilder;

I/O Contract

Inputs

Name Type Required Description
&mut self (delete) &mut Dataset Yes The dataset to delete from. Mutably borrowed to update internal state after commit.
predicate &str Yes SQL filter string identifying rows to delete (e.g., "id > 1000", "status = 'expired'"). Use "true" to delete all rows.
dataset (DeleteBuilder) Arc<Dataset> Yes The dataset to delete from (used with DeleteBuilder).
conflict_retries u32 No Number of commit retries on conflict (default 10).
retry_timeout Duration No Total timeout for all retries (default 30s).

Outputs

Name Type Description
Result (delete) Result<()> Success or error. The dataset is mutated in place to reflect the new version.
Result (DeleteBuilder) Result<Arc<Dataset>> The updated dataset after deletion.

Usage Examples

Basic Delete

use lance::dataset::Dataset;

async fn delete_expired(uri: &str) -> lance::Result<()> {
    let mut dataset = Dataset::open(uri).await?;
    dataset.delete("expiry_date < '2025-01-01'").await?;
    println!("Rows after delete: {}", dataset.count_rows(None).await?);
    Ok(())
}

DeleteBuilder with Custom Retry

use std::sync::Arc;
use std::time::Duration;
use lance::dataset::Dataset;
use lance::dataset::write::delete::DeleteBuilder;

async fn delete_with_retry(dataset: Arc<Dataset>) -> lance::Result<()> {
    let updated = DeleteBuilder::new(dataset, "category = 'obsolete'")
        .conflict_retries(5)
        .retry_timeout(Duration::from_secs(60))
        .execute()
        .await?;

    println!("Delete committed as v{}", updated.version().version);
    Ok(())
}

Truncate All Rows

use lance::dataset::Dataset;

async fn truncate(uri: &str) -> lance::Result<()> {
    let mut dataset = Dataset::open(uri).await?;
    dataset.truncate_table().await?;
    assert_eq!(dataset.count_rows(None).await?, 0);
    Ok(())
}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment