Implementation:Treeverse LakeFS Commit Via S3 Workflow

Knowledge Sources	lakeFS lakeFS S3 Gateway Docs
Domains	S3_Compatibility, REST_API
Last Updated	2026-02-08 00:00 GMT

Overview

API endpoint for committing changes that were staged via the S3 gateway, bridging the S3 write protocol with lakeFS version control.

Description

This implementation documents the lakeFS REST API commit endpoint as used in the S3 gateway integration workflow. After writing objects through the S3 gateway (PutObject, CopyObject, DeleteObject), the commit endpoint is called to create an atomic version snapshot of all staged changes.

The endpoint is:

POST /api/v1/repositories/{repository}/branches/{branch}/commits

Key behavior: All objects written via S3 PutObject are automatically staged on the target branch. No separate "add" or "stage" step is needed before committing. The commit operation packages all staged changes into a single immutable snapshot.

Usage

Use this implementation when:

Completing an S3-based data ingestion pipeline with a version commit
Building automation that writes data via S3 tools and commits via REST API
Creating atomic snapshots of data that was written through Spark, pandas, or AWS CLI via the S3 gateway

Code Reference

Source Location

File: api/swagger.yml
Lines: L4252-4292 (commit endpoint definition)
Schemas: L651-673 (CommitCreation), L600-630 (Commit)
Operation ID: commit

Signature

# api/swagger.yml - commit endpoint
/repositories/{repository}/branches/{branch}/commits:
  parameters:
    - in: path
      name: repository
      required: true
      schema:
        type: string
    - in: path
      name: branch
      required: true
      schema:
        type: string
  post:
    parameters:
      - in: query
        name: source_metarange
        required: false
        description: >
          The source metarange to commit.
          Branch must not have uncommitted changes.
        schema:
          type: string
    tags:
      - commits
    operationId: commit
    summary: create commit
    requestBody:
      required: true
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/CommitCreation"
    responses:
      201:
        description: commit
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/Commit"
      400:
        $ref: "#/components/responses/ValidationError"
      401:
        $ref: "#/components/responses/Unauthorized"
      403:
        $ref: "#/components/responses/Forbidden"
      404:
        $ref: "#/components/responses/NotFound"
      409:
        $ref: "#/components/responses/Conflict"
      412:
        $ref: "#/components/responses/PreconditionFailed"
      429:
        description: too many requests

Import

import requests

# Python requests library for calling the lakeFS REST API
LAKEFS_ENDPOINT = 'http://localhost:8000'
LAKEFS_ACCESS_KEY = 'AKIAIOSFDNN7EXAMPLEQ'
LAKEFS_SECRET_KEY = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'

I/O Contract

Inputs

Parameter	Location	Type	Required	Description
`repository`	path	string	Yes	lakeFS repository name
`branch`	path	string	Yes	Branch name to commit on
`message`	body	string	Yes	Human-readable commit message
`metadata`	body	map[string]string	No	Arbitrary key-value pairs for automation and auditing
`date`	body	integer (int64)	No	Override creation date (Unix Epoch in seconds)
`allow_empty`	body	boolean	No	Allow commits with no changes (default: `false`)
`force`	body	boolean	No	Force commit (default: `false`)
`source_metarange`	query	string	No	Source metarange to commit (branch must have no uncommitted changes)

Outputs

Field	Type	Description
`id`	string	Unique commit identifier (SHA-256 hash)
`parents`	[]string	Parent commit IDs (single parent for normal commits)
`committer`	string	The user who created the commit
`message`	string	The commit message
`creation_date`	integer (int64)	Unix Epoch in seconds
`meta_range_id`	string	Internal reference to the committed data range
`metadata`	map[string]string	User-provided metadata key-value pairs

Error responses:

HTTP Status	Description
400	Validation error (invalid request body)
401	Unauthorized (invalid or missing credentials)
403	Forbidden (insufficient permissions)
404	Repository or branch not found
409	Conflict (concurrent commit on same branch)
412	Precondition failed
429	Too many requests (rate limited)

Usage Examples

Python: Full S3 write + commit workflow

import boto3
import requests
from requests.auth import HTTPBasicAuth

LAKEFS_ENDPOINT = 'http://localhost:8000'
LAKEFS_ACCESS_KEY = 'AKIAIOSFDNN7EXAMPLEQ'
LAKEFS_SECRET_KEY = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'

# Step 1: Write data via S3 gateway
s3 = boto3.client('s3',
    endpoint_url=LAKEFS_ENDPOINT,
    aws_access_key_id=LAKEFS_ACCESS_KEY,
    aws_secret_access_key=LAKEFS_SECRET_KEY,
)

s3.put_object(
    Bucket='my-repo',
    Key='main/data/sales_2026.csv',
    Body=b'date,amount\n2026-01-01,100\n2026-01-02,200\n',
    ContentType='text/csv',
    Metadata={'source': 'etl-pipeline', 'batch_id': '42'}
)

s3.put_object(
    Bucket='my-repo',
    Key='main/data/customers_2026.csv',
    Body=b'id,name\n1,Alice\n2,Bob\n',
    ContentType='text/csv'
)

# Step 2: Commit all staged changes via lakeFS REST API
response = requests.post(
    f'{LAKEFS_ENDPOINT}/api/v1/repositories/my-repo/branches/main/commits',
    json={
        'message': 'Add 2026 sales and customer data',
        'metadata': {
            'pipeline': 'daily-etl',
            'batch_id': '42',
            'source': 's3-gateway'
        }
    },
    auth=HTTPBasicAuth(LAKEFS_ACCESS_KEY, LAKEFS_SECRET_KEY)
)

commit = response.json()
print(f"Commit ID: {commit['id']}")
print(f"Timestamp: {commit['creation_date']}")
print(f"Message:   {commit['message']}")

Python: Spark write + commit workflow

from pyspark.sql import SparkSession
import requests
from requests.auth import HTTPBasicAuth

LAKEFS_ENDPOINT = 'http://localhost:8000'
LAKEFS_ACCESS_KEY = 'AKIAIOSFDNN7EXAMPLEQ'
LAKEFS_SECRET_KEY = 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'

# Step 1: Write data via Spark using S3A
spark = SparkSession.builder \
    .config("spark.hadoop.fs.s3a.endpoint", LAKEFS_ENDPOINT) \
    .config("spark.hadoop.fs.s3a.access.key", LAKEFS_ACCESS_KEY) \
    .config("spark.hadoop.fs.s3a.secret.key", LAKEFS_SECRET_KEY) \
    .config("spark.hadoop.fs.s3a.path.style.access", "true") \
    .getOrCreate()

df = spark.createDataFrame([
    (1, "Alice", 100.0),
    (2, "Bob", 200.0),
], ["id", "name", "amount"])

df.write.mode("overwrite").parquet("s3a://my-repo/main/data/output/")

# Step 2: Commit via lakeFS REST API
response = requests.post(
    f'{LAKEFS_ENDPOINT}/api/v1/repositories/my-repo/branches/main/commits',
    json={
        'message': 'Spark job: write output dataset',
        'metadata': {'job_name': 'daily_aggregation'}
    },
    auth=HTTPBasicAuth(LAKEFS_ACCESS_KEY, LAKEFS_SECRET_KEY)
)

print(f"Committed: {response.json()['id']}")

cURL: Commit via REST API

# Commit staged changes on the main branch
curl -X POST \
  'http://localhost:8000/api/v1/repositories/my-repo/branches/main/commits' \
  -H 'Content-Type: application/json' \
  -u 'AKIAIOSFDNN7EXAMPLEQ:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY' \
  -d '{
    "message": "Daily data ingestion complete",
    "metadata": {
      "pipeline": "daily-etl",
      "run_id": "2026-02-08-001"
    }
  }'

# Example response:
# {
#   "id": "a1b2c3d4e5f6...",
#   "parents": ["f6e5d4c3b2a1..."],
#   "committer": "admin",
#   "message": "Daily data ingestion complete",
#   "creation_date": 1770508800,
#   "meta_range_id": "...",
#   "metadata": {"pipeline": "daily-etl", "run_id": "2026-02-08-001"}
# }

Related Pages

Implements Principle

Principle:Treeverse_LakeFS_S3_Commit_Management

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment