Implementation:Lance format Lance Schema Evolution Ops
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Columnar_Storage |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Concrete tools for evolving the schema of a Lance dataset by adding, altering, or dropping columns, provided by the Lance library.
Description
The Dataset struct provides three schema evolution methods:
add_columns: Appends one or more new columns using aNewColumnTransformthat specifies how values are computed. Supports UDFs, SQL expressions, pre-computed streams, readers, and all-null initialization.alter_columns: Modifies existing columns by renaming, changing nullability, or casting data types. Accepts an array ofColumnAlterationdescriptors.drop_columns: Removes columns from the schema (metadata-only; data files are unchanged until compaction).
These methods mutably borrow the dataset and update it in place to reflect the new version after the commit.
Usage
Use these methods when modifying the schema of an existing dataset. Note that schema evolution operations may conflict with concurrent writes and should be scheduled during low-activity periods.
Code Reference
Source Location
- Repository: Lance
- Files:
rust/lance/src/dataset.rs(L2578-L2608)rust/lance/src/dataset/schema_evolution.rs(L102-L153)
Signature
impl Dataset {
/// Append new columns to the dataset.
pub async fn add_columns(
&mut self,
transforms: NewColumnTransform,
read_columns: Option<Vec<String>>,
batch_size: Option<u32>,
) -> Result<()>;
/// Modify columns: rename, change nullability, or cast data type.
pub async fn alter_columns(
&mut self,
alterations: &[ColumnAlteration],
) -> Result<()>;
/// Remove columns from the dataset (metadata-only).
pub async fn drop_columns(
&mut self,
columns: &[&str],
) -> Result<()>;
}
Supporting Types
// rust/lance/src/dataset/schema_evolution.rs:L102-L115
pub enum NewColumnTransform {
/// UDF that transforms input batches into new columns.
BatchUDF(BatchUDF),
/// SQL expressions defining new columns as (name, expression) pairs.
SqlExpressions(Vec<(String, String)>),
/// Pre-computed stream of RecordBatches with new columns.
Stream(SendableRecordBatchStream),
/// RecordBatchReader providing new column data.
Reader(Box<dyn RecordBatchReader + Send>),
/// Add columns initialized to null values.
AllNulls(Arc<ArrowSchema>),
}
// rust/lance/src/dataset/schema_evolution.rs:L118-L127
pub struct ColumnAlteration {
/// Path to the existing column to alter.
pub path: String,
/// New name for the column (None to keep current name).
pub rename: Option<String>,
/// New nullability (None to keep current setting).
pub nullable: Option<bool>,
/// New data type (None to keep current type; requires data rewrite).
pub data_type: Option<DataType>,
}
impl ColumnAlteration {
pub fn new(path: String) -> Self;
pub fn rename(mut self, name: String) -> Self;
pub fn set_nullable(mut self, nullable: bool) -> Self;
pub fn cast_to(mut self, data_type: DataType) -> Self;
}
Import
use lance::dataset::Dataset;
use lance::dataset::schema_evolution::{NewColumnTransform, ColumnAlteration, BatchUDF};
I/O Contract
Inputs (add_columns)
| Name | Type | Required | Description |
|---|---|---|---|
| &mut self | &mut Dataset |
Yes | The dataset to modify. |
| transforms | NewColumnTransform |
Yes | Specifies how new column values are produced (UDF, SQL, stream, reader, or all-nulls). |
| read_columns | Option<Vec<String>> |
No | Columns to read from existing data as input to the transform. If None, all columns are available. |
| batch_size | Option<u32> |
No | Number of rows to process per batch during the transform. |
Inputs (alter_columns)
| Name | Type | Required | Description |
|---|---|---|---|
| &mut self | &mut Dataset |
Yes | The dataset to modify. |
| alterations | &[ColumnAlteration] |
Yes | Array of column alteration descriptors specifying path, rename, nullability, and/or data type changes. |
Inputs (drop_columns)
| Name | Type | Required | Description |
|---|---|---|---|
| &mut self | &mut Dataset |
Yes | The dataset to modify. |
| columns | &[&str] |
Yes | Names of columns to remove from the schema. |
Outputs
| Name | Type | Description |
|---|---|---|
| Result | Result<()> |
Success or error. The dataset is mutated in place to reflect the new schema version. |
Usage Examples
Add Columns via SQL Expressions
use lance::dataset::Dataset;
use lance::dataset::schema_evolution::NewColumnTransform;
async fn add_computed_column(uri: &str) -> lance::Result<()> {
let mut dataset = Dataset::open(uri).await?;
dataset.add_columns(
NewColumnTransform::SqlExpressions(vec![
("full_name".to_string(), "first_name || ' ' || last_name".to_string()),
("age_group".to_string(), "CASE WHEN age < 18 THEN 'minor' ELSE 'adult' END".to_string()),
]),
None, // read all columns
None, // default batch size
).await?;
Ok(())
}
Add All-Null Columns
use std::sync::Arc;
use arrow_schema::{Schema as ArrowSchema, Field, DataType};
use lance::dataset::Dataset;
use lance::dataset::schema_evolution::NewColumnTransform;
async fn add_null_column(uri: &str) -> lance::Result<()> {
let mut dataset = Dataset::open(uri).await?;
let new_schema = Arc::new(ArrowSchema::new(vec![
Field::new("embedding", DataType::FixedSizeList(
Arc::new(Field::new("item", DataType::Float32, true)), 128
), true),
]));
dataset.add_columns(
NewColumnTransform::AllNulls(new_schema),
None,
None,
).await?;
Ok(())
}
Rename and Cast Columns
use arrow_schema::DataType;
use lance::dataset::Dataset;
use lance::dataset::schema_evolution::ColumnAlteration;
async fn alter_columns(uri: &str) -> lance::Result<()> {
let mut dataset = Dataset::open(uri).await?;
dataset.alter_columns(&[
ColumnAlteration::new("old_name".to_string())
.rename("new_name".to_string()),
ColumnAlteration::new("price".to_string())
.cast_to(DataType::Float64),
ColumnAlteration::new("description".to_string())
.set_nullable(true),
]).await?;
Ok(())
}
Drop Columns
use lance::dataset::Dataset;
async fn drop_columns(uri: &str) -> lance::Result<()> {
let mut dataset = Dataset::open(uri).await?;
dataset.drop_columns(&["deprecated_field", "temp_column"]).await?;
// Physical data remains until compaction:
// optimize::compact_files(&dataset, ...).await?;
// cleanup::cleanup_old_versions(&dataset, ...).await?;
Ok(())
}
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment