Principle:Lance format Lance Tag Management
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Version_Control |
| Last Updated | 2026-02-08 19:00 GMT |
Overview
Tag management is the ability to create, read, update, delete, and list human-readable named references (tags) that point to specific versions of a Lance dataset.
Description
While version numbers are stable and unambiguous, they are not always convenient for human communication. Tags provide a way to assign meaningful names -- such as release-v2.1, training-baseline, or pre-migration -- to specific dataset versions. A tag is a lightweight JSON file stored under the dataset's _refs/tags/ directory that records the branch, version number, and manifest size.
Tags differ from branches in that they are simple pointers rather than mutable lines of development. A tag always refers to a single, fixed (branch, version) pair. Tags can be updated to point to a different version, but this is an explicit operation rather than the automatic advancement that occurs on a branch with each commit.
Lance enforces validation rules on tag names to prevent ambiguity and filesystem issues:
- Characters must be alphanumeric,
.,-, or_. - Names cannot start or end with a dot.
- Names cannot contain
..(double-dot). - Names cannot end with
.lock.
Usage
Use tags when:
- Marking release points -- tag the dataset version used for a production model deployment.
- Creating stable references -- give collaborators a name to check out rather than a version number.
- Annotating milestones -- mark versions that passed quality checks or represent significant data changes.
- Enabling time-travel by name --
dataset.checkout_version("my-tag")resolves the tag and loads the corresponding version.
Theoretical Basis
Storage Model
Tags are stored as individual JSON files at the path {base}/_refs/tags/{tag_name}.json. Each file contains a TagContents object:
{
"branch": "feature-branch", // null for main branch
"version": 42,
"manifestSize": 8192
}
This design has several properties:
- Atomicity via object-store put -- creating or updating a tag is a single
PUToperation, which is atomic on all supported object stores. - No coordination needed -- tags do not participate in the commit protocol; they are metadata-only references.
- Cheap enumeration -- listing tags requires only a directory listing of the
_refs/tags/prefix, followed by parallel reads of each JSON file.
CRUD Operations
The Tags struct exposes five primary operations:
- create(tag, ref) -- writes a new tag file; fails if the tag already exists.
- get(tag) -- reads and deserializes a single tag file.
- list() -- enumerates all tag files and returns a
HashMap<String, TagContents>. - update(tag, ref) -- overwrites an existing tag file; fails if the tag does not exist.
- delete(tag) -- removes the tag file; fails if the tag does not exist.
All operations validate the tag name before proceeding and verify that the referenced version actually exists in storage.