Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Lance format Lance Tag Management

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Version_Control
Last Updated 2026-02-08 19:00 GMT

Overview

Tag management is the ability to create, read, update, delete, and list human-readable named references (tags) that point to specific versions of a Lance dataset.

Description

While version numbers are stable and unambiguous, they are not always convenient for human communication. Tags provide a way to assign meaningful names -- such as release-v2.1, training-baseline, or pre-migration -- to specific dataset versions. A tag is a lightweight JSON file stored under the dataset's _refs/tags/ directory that records the branch, version number, and manifest size.

Tags differ from branches in that they are simple pointers rather than mutable lines of development. A tag always refers to a single, fixed (branch, version) pair. Tags can be updated to point to a different version, but this is an explicit operation rather than the automatic advancement that occurs on a branch with each commit.

Lance enforces validation rules on tag names to prevent ambiguity and filesystem issues:

  • Characters must be alphanumeric, ., -, or _.
  • Names cannot start or end with a dot.
  • Names cannot contain .. (double-dot).
  • Names cannot end with .lock.

Usage

Use tags when:

  • Marking release points -- tag the dataset version used for a production model deployment.
  • Creating stable references -- give collaborators a name to check out rather than a version number.
  • Annotating milestones -- mark versions that passed quality checks or represent significant data changes.
  • Enabling time-travel by name -- dataset.checkout_version("my-tag") resolves the tag and loads the corresponding version.

Theoretical Basis

Storage Model

Tags are stored as individual JSON files at the path {base}/_refs/tags/{tag_name}.json. Each file contains a TagContents object:

{
    "branch": "feature-branch",   // null for main branch
    "version": 42,
    "manifestSize": 8192
}

This design has several properties:

  • Atomicity via object-store put -- creating or updating a tag is a single PUT operation, which is atomic on all supported object stores.
  • No coordination needed -- tags do not participate in the commit protocol; they are metadata-only references.
  • Cheap enumeration -- listing tags requires only a directory listing of the _refs/tags/ prefix, followed by parallel reads of each JSON file.

CRUD Operations

The Tags struct exposes five primary operations:

  1. create(tag, ref) -- writes a new tag file; fails if the tag already exists.
  2. get(tag) -- reads and deserializes a single tag file.
  3. list() -- enumerates all tag files and returns a HashMap<String, TagContents>.
  4. update(tag, ref) -- overwrites an existing tag file; fails if the tag does not exist.
  5. delete(tag) -- removes the tag file; fails if the tag does not exist.

All operations validate the tag name before proceeding and verify that the referenced version actually exists in storage.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment