Principle:Lance format Lance Time Travel Queries
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Version_Control |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Time-travel queries allow a user to read a Lance dataset as it existed at any previously committed version, without modifying the current state.
Description
Because Lance persists every version as an immutable manifest pointing to immutable data files, any historical state can be reconstructed by loading the corresponding manifest. Time-travel is the user-facing capability built on this property: given a version reference, Lance returns a read-only Dataset handle pinned to that version's schema, fragments, and indices.
Lance supports three kinds of version references, unified under the Ref enum:
- VersionNumber(u64) -- a direct numeric version on the current branch.
- Version(Option<String>, Option<u64>) -- a fully qualified reference consisting of an optional branch name and an optional version number. If the branch is
None, the main branch is assumed. If the version number isNone, the latest version on the branch is used. - Tag(String) -- a named alias that resolves to a specific (branch, version) pair via the tag store.
All three reference types are accepted by Dataset::checkout_version() through Rust's Into<Ref> trait, so callers can pass a bare u64, a string tag name, or a tuple.
Usage
Time-travel queries are used to:
- Reproduce experiments -- load the exact dataset version used for a previous training run.
- Compare schema evolution -- read two versions side by side to understand how the schema changed.
- Debug data issues -- examine data before and after a problematic write.
- Branch navigation -- switch between branches and their specific versions.
Theoretical Basis
Immutable Manifest Graph
Each version in Lance is an immutable manifest that references a set of data fragments. Fragments are themselves immutable; mutations (deletes, updates) produce new fragments rather than modifying existing ones. This forms a directed acyclic graph of manifests and fragments where checking out a version simply means selecting the correct manifest node and following its fragment pointers.
Reference Resolution
The resolution algorithm proceeds in three stages:
- Ref dispatch: The
Refenum is pattern-matched. Tags are resolved first by loading theTagContentsJSON file, which yields a branch name and version number. Version tuples and bare version numbers proceed directly. - Branch resolution: If the reference names a branch other than
main, the dataset locates the branch's data directory (a sibling directory under the dataset root at{base}/tree/{branch_name}/). - Manifest loading: The commit handler resolves the manifest location for the target version (or latest if no version was specified), loads the manifest from storage, and constructs a new
Datasethandle pinned to that manifest.
An optimization short-circuits the process: if the dataset is already checked out at the requested version (same version, same branch, same e_tag), the current handle is returned without any I/O.