Implementation:Treeverse LakeFS CreateTag
| Knowledge Sources | |
|---|---|
| Domains | Data_Version_Control, REST_API |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for creating an immutable tag on a specific commit in a lakeFS repository provided by the lakeFS REST API.
Description
The createTag endpoint creates a named, immutable reference (tag) pointing to a specific commit in a repository. Tags provide human-readable labels for important data milestones such as releases, training data snapshots, or audit checkpoints. Once created, a tag permanently points to the same commit, ensuring reproducible access to that exact data state. The tag can be used anywhere a commit reference is accepted throughout the lakeFS API.
Usage
Use this API when:
- Marking a production data release with a versioned label.
- Tagging the training data commit for a machine learning model version.
- Creating audit-ready immutable references for regulatory compliance.
- Establishing named checkpoints after successful pipeline completions.
Code Reference
Source Location
- Repository: lakeFS
- File: api/swagger.yml (lines 3999-4031)
Signature
/repositories/{repository}/tags:
post:
operationId: createTag
summary: create tag
parameters:
- in: path
name: repository
required: true
schema:
type: string
requestBody:
required: true
content:
application/json:
schema:
$ref: "#/components/schemas/TagCreation"
responses:
201:
description: tag
content:
application/json:
schema:
$ref: "#/components/schemas/Ref"
Import
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="access_key_id",
password="secret_access_key"
)
repo = lakefs.Repository("my-repo", client=client)
tag = repo.tag("v1.0").create(source_ref="main")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repository (path param) | string | Yes | Repository name. |
| id | string | Yes | Tag name (the human-readable label for this tag). |
| ref | string | Yes | The commit reference to tag (commit ID, branch name, or existing tag). |
| force | boolean | No | If true, overwrite an existing tag with the same name. Defaults to false.
|
Outputs
| Name | Type | Description |
|---|---|---|
| id | string | The tag name. |
| commit_id | string | The commit ID that this tag points to. |
Usage Examples
Create a Tag Using the Python SDK
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="AKIAIOSFODNN7EXAMPLE",
password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)
repo = lakefs.Repository("my-data-repo", client=client)
# Tag the current state of main as a release
tag = repo.tag("v1.0-release").create(source_ref="main")
print(f"Tag created: {tag.id}")
Create a Tag Using curl
curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/tags \
-H "Content-Type: application/json" \
-u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
-d '{
"id": "v1.0-release",
"ref": "main"
}'
Tag a Specific Commit for ML Training Data
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="AKIAIOSFODNN7EXAMPLE",
password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)
repo = lakefs.Repository("ml-data-repo", client=client)
# Get the latest commit on the training-data branch
branch = repo.branch("training-data")
log = list(branch.log(max_amount=1))
latest_commit_id = log[0].id
# Tag the specific commit used for model training
tag = repo.tag("model-v3-training-data").create(source_ref=latest_commit_id)
print(f"Tagged commit {latest_commit_id} as 'model-v3-training-data'")
Force-Overwrite an Existing Tag
curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/tags \
-H "Content-Type: application/json" \
-u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
-d '{
"id": "latest-release",
"ref": "main",
"force": true
}'