Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance Schema Evolution Ops

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Columnar_Storage
Last Updated 2026-02-08 19:00 GMT

Overview

Concrete tools for evolving the schema of a Lance dataset by adding, altering, or dropping columns, provided by the Lance library.

Description

The Dataset struct provides three schema evolution methods:

  • add_columns: Appends one or more new columns using a NewColumnTransform that specifies how values are computed. Supports UDFs, SQL expressions, pre-computed streams, readers, and all-null initialization.
  • alter_columns: Modifies existing columns by renaming, changing nullability, or casting data types. Accepts an array of ColumnAlteration descriptors.
  • drop_columns: Removes columns from the schema (metadata-only; data files are unchanged until compaction).

These methods mutably borrow the dataset and update it in place to reflect the new version after the commit.

Usage

Use these methods when modifying the schema of an existing dataset. Note that schema evolution operations may conflict with concurrent writes and should be scheduled during low-activity periods.

Code Reference

Source Location

  • Repository: Lance
  • Files:
    • rust/lance/src/dataset.rs (L2578-L2608)
    • rust/lance/src/dataset/schema_evolution.rs (L102-L153)

Signature

impl Dataset {
    /// Append new columns to the dataset.
    pub async fn add_columns(
        &mut self,
        transforms: NewColumnTransform,
        read_columns: Option<Vec<String>>,
        batch_size: Option<u32>,
    ) -> Result<()>;

    /// Modify columns: rename, change nullability, or cast data type.
    pub async fn alter_columns(
        &mut self,
        alterations: &[ColumnAlteration],
    ) -> Result<()>;

    /// Remove columns from the dataset (metadata-only).
    pub async fn drop_columns(
        &mut self,
        columns: &[&str],
    ) -> Result<()>;
}

Supporting Types

// rust/lance/src/dataset/schema_evolution.rs:L102-L115
pub enum NewColumnTransform {
    /// UDF that transforms input batches into new columns.
    BatchUDF(BatchUDF),
    /// SQL expressions defining new columns as (name, expression) pairs.
    SqlExpressions(Vec<(String, String)>),
    /// Pre-computed stream of RecordBatches with new columns.
    Stream(SendableRecordBatchStream),
    /// RecordBatchReader providing new column data.
    Reader(Box<dyn RecordBatchReader + Send>),
    /// Add columns initialized to null values.
    AllNulls(Arc<ArrowSchema>),
}

// rust/lance/src/dataset/schema_evolution.rs:L118-L127
pub struct ColumnAlteration {
    /// Path to the existing column to alter.
    pub path: String,
    /// New name for the column (None to keep current name).
    pub rename: Option<String>,
    /// New nullability (None to keep current setting).
    pub nullable: Option<bool>,
    /// New data type (None to keep current type; requires data rewrite).
    pub data_type: Option<DataType>,
}

impl ColumnAlteration {
    pub fn new(path: String) -> Self;
    pub fn rename(mut self, name: String) -> Self;
    pub fn set_nullable(mut self, nullable: bool) -> Self;
    pub fn cast_to(mut self, data_type: DataType) -> Self;
}

Import

use lance::dataset::Dataset;
use lance::dataset::schema_evolution::{NewColumnTransform, ColumnAlteration, BatchUDF};

I/O Contract

Inputs (add_columns)

Name Type Required Description
&mut self &mut Dataset Yes The dataset to modify.
transforms NewColumnTransform Yes Specifies how new column values are produced (UDF, SQL, stream, reader, or all-nulls).
read_columns Option<Vec<String>> No Columns to read from existing data as input to the transform. If None, all columns are available.
batch_size Option<u32> No Number of rows to process per batch during the transform.

Inputs (alter_columns)

Name Type Required Description
&mut self &mut Dataset Yes The dataset to modify.
alterations &[ColumnAlteration] Yes Array of column alteration descriptors specifying path, rename, nullability, and/or data type changes.

Inputs (drop_columns)

Name Type Required Description
&mut self &mut Dataset Yes The dataset to modify.
columns &[&str] Yes Names of columns to remove from the schema.

Outputs

Name Type Description
Result Result<()> Success or error. The dataset is mutated in place to reflect the new schema version.

Usage Examples

Add Columns via SQL Expressions

use lance::dataset::Dataset;
use lance::dataset::schema_evolution::NewColumnTransform;

async fn add_computed_column(uri: &str) -> lance::Result<()> {
    let mut dataset = Dataset::open(uri).await?;
    dataset.add_columns(
        NewColumnTransform::SqlExpressions(vec![
            ("full_name".to_string(), "first_name || ' ' || last_name".to_string()),
            ("age_group".to_string(), "CASE WHEN age < 18 THEN 'minor' ELSE 'adult' END".to_string()),
        ]),
        None,  // read all columns
        None,  // default batch size
    ).await?;
    Ok(())
}

Add All-Null Columns

use std::sync::Arc;
use arrow_schema::{Schema as ArrowSchema, Field, DataType};
use lance::dataset::Dataset;
use lance::dataset::schema_evolution::NewColumnTransform;

async fn add_null_column(uri: &str) -> lance::Result<()> {
    let mut dataset = Dataset::open(uri).await?;
    let new_schema = Arc::new(ArrowSchema::new(vec![
        Field::new("embedding", DataType::FixedSizeList(
            Arc::new(Field::new("item", DataType::Float32, true)), 128
        ), true),
    ]));
    dataset.add_columns(
        NewColumnTransform::AllNulls(new_schema),
        None,
        None,
    ).await?;
    Ok(())
}

Rename and Cast Columns

use arrow_schema::DataType;
use lance::dataset::Dataset;
use lance::dataset::schema_evolution::ColumnAlteration;

async fn alter_columns(uri: &str) -> lance::Result<()> {
    let mut dataset = Dataset::open(uri).await?;
    dataset.alter_columns(&[
        ColumnAlteration::new("old_name".to_string())
            .rename("new_name".to_string()),
        ColumnAlteration::new("price".to_string())
            .cast_to(DataType::Float64),
        ColumnAlteration::new("description".to_string())
            .set_nullable(true),
    ]).await?;
    Ok(())
}

Drop Columns

use lance::dataset::Dataset;

async fn drop_columns(uri: &str) -> lance::Result<()> {
    let mut dataset = Dataset::open(uri).await?;
    dataset.drop_columns(&["deprecated_field", "temp_column"]).await?;
    // Physical data remains until compaction:
    // optimize::compact_files(&dataset, ...).await?;
    // cleanup::cleanup_old_versions(&dataset, ...).await?;
    Ok(())
}

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment