Implementation:Treeverse LakeFS Commit Operation
| Knowledge Sources | |
|---|---|
| Domains | Data_Version_Control, REST_API |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for committing staged changes on a branch in a lakeFS repository provided by the lakeFS REST API.
Description
The commit endpoint creates an immutable snapshot of all staged (uncommitted) changes on a specified branch. The commit records a message, optional metadata, and the complete state of all objects on the branch at the time of the commit. Pre-commit hooks, if configured, are executed before the commit is finalized and can reject the operation. The resulting commit object includes a unique ID, parent references, and a meta-range ID that captures the full data state.
Usage
Use this API when:
- Finalizing a batch of data uploads or modifications on a branch.
- Creating a permanent, immutable checkpoint after a data pipeline stage completes.
- Recording audit metadata (pipeline run ID, author, source system) alongside the data snapshot.
- Triggering post-commit hooks for downstream notifications or automated processing.
Code Reference
Source Location
- Repository: lakeFS
- File: api/swagger.yml (lines 4252-4292)
Signature
/repositories/{repository}/branches/{branch}/commits:
post:
operationId: commit
summary: create commit
parameters:
- in: path
name: repository
required: true
schema:
type: string
- in: path
name: branch
required: true
schema:
type: string
- in: query
name: source_metarange
schema:
type: string
description: Use an existing metarange as the commit source
requestBody:
required: true
content:
application/json:
schema:
$ref: "#/components/schemas/CommitCreation"
responses:
201:
description: commit
content:
application/json:
schema:
$ref: "#/components/schemas/Commit"
412:
description: Precondition Failed (pre-commit hook rejection)
Import
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="access_key_id",
password="secret_access_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")
commit = branch.commit(message="Add customer data for January 2026")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repository (path param) | string | Yes | Repository name. |
| branch (path param) | string | Yes | Branch name to commit changes on. |
| message | string | Yes | Human-readable commit message describing the changes. |
| metadata | map[string]string | No | Optional key-value pairs for additional context (e.g., pipeline ID, source). |
| date | integer (int64) | No | Optional Unix epoch timestamp override for the commit creation date. |
| allow_empty | boolean | No | If true, allow creating a commit with no staged changes. Defaults to false.
|
| force | boolean | No | If true, bypass certain safety checks. Defaults to false.
|
| source_metarange (query param) | string | No | Use an existing metarange as the commit source instead of the staging area. |
Outputs
| Name | Type | Description |
|---|---|---|
| id | string | Unique content-addressable commit identifier. |
| parents | list[string] | List of parent commit IDs (typically one for regular commits, two for merges). |
| committer | string | Identity of the user who created the commit. |
| message | string | The commit message. |
| creation_date | integer (int64) | Unix epoch timestamp of the commit creation. |
| meta_range_id | string | Reference to the internal metarange capturing the full data state. |
| metadata | map[string]string | User-supplied metadata key-value pairs. |
| generation | integer | Position in the DAG for efficient traversal. |
| version | integer | Internal version number of the commit format. |
Usage Examples
Commit Changes Using the Python SDK
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="AKIAIOSFODNN7EXAMPLE",
password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)
repo = lakefs.Repository("my-data-repo", client=client)
branch = repo.branch("experiment-v2")
# Upload some data first
branch.object("data/results.parquet").upload(
data=open("results.parquet", "rb")
)
# Commit the staged changes with metadata
commit = branch.commit(
message="Add experiment v2 results",
metadata={
"pipeline_run_id": "run-20260208-001",
"source": "spark-etl-pipeline",
"author": "data-team"
}
)
print(f"Commit ID: {commit.id}")
print(f"Creation date: {commit.creation_date}")
Commit Using curl
curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/branches/experiment-v2/commits \
-H "Content-Type: application/json" \
-u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
-d '{
"message": "Add experiment v2 results",
"metadata": {
"pipeline_run_id": "run-20260208-001",
"source": "spark-etl-pipeline"
}
}'
Create an Empty Commit (Metadata-Only Event)
curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/branches/main/commits \
-H "Content-Type: application/json" \
-u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
-d '{
"message": "Mark pipeline validation complete",
"allow_empty": true,
"metadata": {
"validation_status": "passed",
"validator": "data-quality-service"
}
}'