Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Lance format Lance Time Travel Queries

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Version_Control
Last Updated 2026-02-08 19:00 GMT

Overview

Time-travel queries allow a user to read a Lance dataset as it existed at any previously committed version, without modifying the current state.

Description

Because Lance persists every version as an immutable manifest pointing to immutable data files, any historical state can be reconstructed by loading the corresponding manifest. Time-travel is the user-facing capability built on this property: given a version reference, Lance returns a read-only Dataset handle pinned to that version's schema, fragments, and indices.

Lance supports three kinds of version references, unified under the Ref enum:

  • VersionNumber(u64) -- a direct numeric version on the current branch.
  • Version(Option<String>, Option<u64>) -- a fully qualified reference consisting of an optional branch name and an optional version number. If the branch is None, the main branch is assumed. If the version number is None, the latest version on the branch is used.
  • Tag(String) -- a named alias that resolves to a specific (branch, version) pair via the tag store.

All three reference types are accepted by Dataset::checkout_version() through Rust's Into<Ref> trait, so callers can pass a bare u64, a string tag name, or a tuple.

Usage

Time-travel queries are used to:

  • Reproduce experiments -- load the exact dataset version used for a previous training run.
  • Compare schema evolution -- read two versions side by side to understand how the schema changed.
  • Debug data issues -- examine data before and after a problematic write.
  • Branch navigation -- switch between branches and their specific versions.

Theoretical Basis

Immutable Manifest Graph

Each version in Lance is an immutable manifest that references a set of data fragments. Fragments are themselves immutable; mutations (deletes, updates) produce new fragments rather than modifying existing ones. This forms a directed acyclic graph of manifests and fragments where checking out a version simply means selecting the correct manifest node and following its fragment pointers.

Reference Resolution

The resolution algorithm proceeds in three stages:

  1. Ref dispatch: The Ref enum is pattern-matched. Tags are resolved first by loading the TagContents JSON file, which yields a branch name and version number. Version tuples and bare version numbers proceed directly.
  2. Branch resolution: If the reference names a branch other than main, the dataset locates the branch's data directory (a sibling directory under the dataset root at {base}/tree/{branch_name}/).
  3. Manifest loading: The commit handler resolves the manifest location for the target version (or latest if no version was specified), loads the manifest from storage, and constructs a new Dataset handle pinned to that manifest.

An optimization short-circuits the process: if the dataset is already checked out at the requested version (same version, same branch, same e_tag), the current handle is returned without any I/O.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment