Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Treeverse LakeFS Commit Operation

From Leeroopedia


Knowledge Sources
Domains Data_Version_Control, REST_API
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for committing staged changes on a branch in a lakeFS repository provided by the lakeFS REST API.

Description

The commit endpoint creates an immutable snapshot of all staged (uncommitted) changes on a specified branch. The commit records a message, optional metadata, and the complete state of all objects on the branch at the time of the commit. Pre-commit hooks, if configured, are executed before the commit is finalized and can reject the operation. The resulting commit object includes a unique ID, parent references, and a meta-range ID that captures the full data state.

Usage

Use this API when:

  • Finalizing a batch of data uploads or modifications on a branch.
  • Creating a permanent, immutable checkpoint after a data pipeline stage completes.
  • Recording audit metadata (pipeline run ID, author, source system) alongside the data snapshot.
  • Triggering post-commit hooks for downstream notifications or automated processing.

Code Reference

Source Location

  • Repository: lakeFS
  • File: api/swagger.yml (lines 4252-4292)

Signature

/repositories/{repository}/branches/{branch}/commits:
  post:
    operationId: commit
    summary: create commit
    parameters:
      - in: path
        name: repository
        required: true
        schema:
          type: string
      - in: path
        name: branch
        required: true
        schema:
          type: string
      - in: query
        name: source_metarange
        schema:
          type: string
        description: Use an existing metarange as the commit source
    requestBody:
      required: true
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/CommitCreation"
    responses:
      201:
        description: commit
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/Commit"
      412:
        description: Precondition Failed (pre-commit hook rejection)

Import

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="access_key_id",
    password="secret_access_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")
commit = branch.commit(message="Add customer data for January 2026")

I/O Contract

Inputs

Name Type Required Description
repository (path param) string Yes Repository name.
branch (path param) string Yes Branch name to commit changes on.
message string Yes Human-readable commit message describing the changes.
metadata map[string]string No Optional key-value pairs for additional context (e.g., pipeline ID, source).
date integer (int64) No Optional Unix epoch timestamp override for the commit creation date.
allow_empty boolean No If true, allow creating a commit with no staged changes. Defaults to false.
force boolean No If true, bypass certain safety checks. Defaults to false.
source_metarange (query param) string No Use an existing metarange as the commit source instead of the staging area.

Outputs

Name Type Description
id string Unique content-addressable commit identifier.
parents list[string] List of parent commit IDs (typically one for regular commits, two for merges).
committer string Identity of the user who created the commit.
message string The commit message.
creation_date integer (int64) Unix epoch timestamp of the commit creation.
meta_range_id string Reference to the internal metarange capturing the full data state.
metadata map[string]string User-supplied metadata key-value pairs.
generation integer Position in the DAG for efficient traversal.
version integer Internal version number of the commit format.

Usage Examples

Commit Changes Using the Python SDK

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
)

repo = lakefs.Repository("my-data-repo", client=client)
branch = repo.branch("experiment-v2")

# Upload some data first
branch.object("data/results.parquet").upload(
    data=open("results.parquet", "rb")
)

# Commit the staged changes with metadata
commit = branch.commit(
    message="Add experiment v2 results",
    metadata={
        "pipeline_run_id": "run-20260208-001",
        "source": "spark-etl-pipeline",
        "author": "data-team"
    }
)
print(f"Commit ID: {commit.id}")
print(f"Creation date: {commit.creation_date}")

Commit Using curl

curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/branches/experiment-v2/commits \
  -H "Content-Type: application/json" \
  -u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
  -d '{
    "message": "Add experiment v2 results",
    "metadata": {
      "pipeline_run_id": "run-20260208-001",
      "source": "spark-etl-pipeline"
    }
  }'

Create an Empty Commit (Metadata-Only Event)

curl -X POST http://localhost:8000/api/v1/repositories/my-data-repo/branches/main/commits \
  -H "Content-Type: application/json" \
  -u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" \
  -d '{
    "message": "Mark pipeline validation complete",
    "allow_empty": true,
    "metadata": {
      "validation_status": "passed",
      "validator": "data-quality-service"
    }
  }'

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment