Workflow:Lance format Lance Version Management

Knowledge Sources	Lance Versioning Guide Tags Guide Transaction Spec
Domains	Data_Engineering, Version_Control, ML_Ops
Last Updated	2026-02-08 19:00 GMT

Overview

End-to-end process for managing dataset versions, tags, and time-travel queries in Lance using its built-in automatic versioning and transaction system.

Description

This workflow covers Lance's version control system, which automatically creates a new immutable version for every write operation. Each version is tracked by a manifest file that records the schema, fragment list, and transaction metadata. Users can navigate between versions using numeric version IDs or human-readable tags, enabling reproducible ML experiments, dataset rollback, and audit trails. The transaction system provides ACID guarantees with optimistic concurrency control and conflict resolution.

Usage

Execute this workflow when you need to track dataset changes over time for reproducibility, tag specific versions for ML model training checkpoints, roll back to a previous dataset state after erroneous writes, or implement branching strategies for parallel experimentation on the same base dataset.

Execution Steps

Step 1: Understanding Automatic Versioning

Every write operation (append, overwrite, update, delete, schema change) automatically creates a new dataset version. Each version is assigned a monotonically increasing integer ID and records the timestamp, operation type, and metadata diff from the previous version. Versions are immutable once committed; they can only be removed by the cleanup process after the retention period expires.

Key considerations:

Version numbers start at 1 and increment with each write
Each version stores a complete manifest (not a diff), enabling fast access
Versions are cheap to create since they share unchanged data fragments
The latest version is the default when opening a dataset

Step 2: Version Listing and Inspection

List all available versions to understand the dataset's history. Each version entry includes its numeric ID, creation timestamp, and metadata describing the operation that created it. This provides an audit trail of all mutations applied to the dataset.

Key considerations:

Version listing reads only manifest metadata, not data files
Older versions may be unavailable if cleanup has removed their data files
Version metadata includes the operation type (append, overwrite, delete, etc.)
Use version timestamps to correlate dataset changes with external events

Step 3: Time-Travel Queries

Open a specific historical version of the dataset by providing a version number or tag name. All subsequent read operations on this handle reflect the dataset state at that version. This enables reproducible reads for ML training, debugging data issues, and comparing dataset states across versions.

Key considerations:

Time-travel queries are read-only; you cannot write to a historical version
Data files for old versions may be garbage collected; access may fail for very old versions
Performance of historical reads is identical to reading the latest version
Multiple readers can access different versions concurrently

Step 4: Tag Creation and Management

Create human-readable tags that point to specific version numbers. Tags provide stable references like "production", "training_v2", or "pre_cleanup" that persist even as new versions are created. Tags can be listed, created, and deleted to manage important dataset milestones.

Key considerations:

Tag names must be unique within a dataset
Tags survive new writes; they always point to the same version
Deleting a tag does not affect the underlying version
Use tags to mark versions used for training specific ML models

Step 5: Version Restoration

Restore the dataset to a previous version's state by creating a new version that copies the old version's manifest. This effectively "undoes" all changes made after the target version while preserving the full version history. Restoration is a metadata-only operation that does not copy data files.

Key considerations:

Restore creates a new version (it does not delete intermediate versions)
The restored version shares data fragments with the original
Ensure the target version's data files have not been garbage collected
Combine restoration with tagging to mark both the rollback point and the original

Execution Diagram

GitHub URL

Workflow Repository