Implementation:Treeverse LakeFS Commit Operation

Knowledge Sources	lakeFS lakeFS API Reference
Domains	Data_Version_Control, REST_API
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for committing staged changes on a branch in a lakeFS repository provided by the lakeFS REST API.

Description

The commit endpoint creates an immutable snapshot of all staged (uncommitted) changes on a specified branch. The commit records a message, optional metadata, and the complete state of all objects on the branch at the time of the commit. Pre-commit hooks, if configured, are executed before the commit is finalized and can reject the operation. The resulting commit object includes a unique ID, parent references, and a meta-range ID that captures the full data state.

Usage

Use this API when:

Finalizing a batch of data uploads or modifications on a branch.
Creating a permanent, immutable checkpoint after a data pipeline stage completes.
Recording audit metadata (pipeline run ID, author, source system) alongside the data snapshot.
Triggering post-commit hooks for downstream notifications or automated processing.

Code Reference

Source Location

Repository: lakeFS
File: api/swagger.yml (lines 4252-4292)

Signature

/repositories/{repository}/branches/{branch}/commits:
  post:
    operationId: commit
    summary: create commit
    parameters:
      - in: path
        name: repository
        required: true
        schema:
          type: string
      - in: path
        name: branch
        required: true
        schema:
          type: string
      - in: query
        name: source_metarange
        schema:
          type: string
        description: Use an existing metarange as the commit source
    requestBody:
      required: true
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/CommitCreation"
    responses:
      201:
        description: commit
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/Commit"
      412:
        description: Precondition Failed (pre-commit hook rejection)

Import

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="access_key_id",
    password="secret_access_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")
commit = branch.commit(message="Add customer data for January 2026")

I/O Contract

Inputs

Name	Type	Required	Description
repository (path param)	string	Yes	Repository name.
branch (path param)	string	Yes	Branch name to commit changes on.
message	string	Yes	Human-readable commit message describing the changes.
metadata	map[string]string	No	Optional key-value pairs for additional context (e.g., pipeline ID, source).
date	integer (int64)	No	Optional Unix epoch timestamp override for the commit creation date.
allow_empty	boolean	No	If true, allow creating a commit with no staged changes. Defaults to `false`.
force	boolean	No	If true, bypass certain safety checks. Defaults to `false`.
source_metarange (query param)	string	No	Use an existing metarange as the commit source instead of the staging area.

Outputs

Name	Type	Description
id	string	Unique content-addressable commit identifier.
parents	list[string]	List of parent commit IDs (typically one for regular commits, two for merges).
committer	string	Identity of the user who created the commit.
message	string	The commit message.
creation_date	integer (int64)	Unix epoch timestamp of the commit creation.
meta_range_id	string	Reference to the internal metarange capturing the full data state.
metadata	map[string]string	User-supplied metadata key-value pairs.
generation	integer	Position in the DAG for efficient traversal.
version	integer	Internal version number of the commit format.

Usage Examples

Commit Changes Using the Python SDK

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)

repo = lakefs.Repository("my-data-repo", client=client)
branch = repo.branch("experiment-v2")

# Upload some data first
branch.object("data/results.parquet").upload(
    data=open("results.parquet", "rb")
)

# Commit the staged changes with metadata
commit = branch.commit(
    message="Add experiment v2 results",
    metadata={
        "pipeline_run_id": "run-20260208-001",
        "source": "spark-etl-pipeline",
        "author": "data-team"
    }
)
print(f"Commit ID: {commit.id}")
print(f"Creation date: {commit.creation_date}")

Commit Using curl

curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/branches/experiment-v2/commits \
  -H "Content-Type: application/json" \
  -u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
  -d '{
    "message": "Add experiment v2 results",
    "metadata": {
      "pipeline_run_id": "run-20260208-001",
      "source": "spark-etl-pipeline"
    }
  }'

Create an Empty Commit (Metadata-Only Event)

curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/branches/main/commits \
  -H "Content-Type: application/json" \
  -u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
  -d '{
    "message": "Mark pipeline validation complete",
    "allow_empty": true,
    "metadata": {
      "validation_status": "passed",
      "validator": "data-quality-service"
    }
  }'

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment