Principle:Treeverse LakeFS S3 Commit Management

Knowledge Sources	lakeFS lakeFS Documentation
Domains	S3_Compatibility, Data_Integration
Last Updated	2026-02-08 00:00 GMT

Overview

Committing S3-staged changes through the lakeFS REST API to create versioned snapshots.

Description

Changes written through the S3 gateway are automatically staged but not committed. To create a version snapshot, users must call the lakeFS REST API commit endpoint. This separation of "write" (S3 protocol) and "version" (lakeFS API) operations is a fundamental design principle of the lakeFS S3 gateway integration.

The commit operation:

Takes all staged changes on a branch (uploads, deletes, copies made via S3 or lakeFS API)
Creates a single atomic commit with a message and optional metadata
Returns a commit object with a unique ID, timestamp, and parent references
Makes the committed state visible as the new head of the branch

This is the bridge between the S3 protocol world and the lakeFS versioning world.

Usage

Use this principle when:

Completing an S3-based data ingestion workflow with a commit
Understanding the two-phase nature of lakeFS writes (stage via S3, commit via API)
Designing ETL pipelines that write via S3 and need version control
Building automation that writes data through S3 tools and then commits via the REST API

Theoretical Basis

The commit model enforces a clear separation of concerns:

S3 Protocol Layer          lakeFS Versioning Layer
==================         =======================
PutObject    ----\
CopyObject   -----+--->  Staging Area  ---> Commit (REST API)  ---> Branch History
DeleteObject ----/             |                    |
                               |                    v
                        (uncommitted)        (immutable snapshot)

Why this separation matters:

Atomicity: Multiple S3 writes can be grouped into a single atomic commit, ensuring consumers see a consistent state
Isolation: Uncommitted changes on one branch do not affect other branches or consumers reading committed data
Auditability: Every commit has a message, timestamp, committer, and optional metadata, creating a full audit trail
Tool compatibility: S3-compatible tools do not need to know about commits; they write data using standard S3 operations

The commit workflow:

Step	Protocol	Operation	Description
1	S3	PutObject / CopyObject / DeleteObject	Write changes; they are automatically staged
2	REST API	POST /repositories/{repo}/branches/{branch}/commits	Commit all staged changes atomically
3	S3 or REST	GetObject or list commits	Read the committed data or inspect the commit history

Commit request schema (CommitCreation):

CommitCreation {
    message:     string       (required) -- Human-readable commit message
    metadata:    map[string]string (optional) -- Arbitrary key-value pairs for automation
    date:        integer      (optional) -- Override creation date (Unix Epoch in seconds)
    allow_empty: boolean      (optional, default: false) -- Allow commits with no changes
    force:       boolean      (optional, default: false) -- Force commit
}

Commit response schema (Commit):

Commit {
    id:             string          -- Unique commit identifier
    parents:        []string        -- Parent commit IDs
    committer:      string          -- Who created the commit
    message:        string          -- Commit message
    creation_date:  integer         -- Unix Epoch in seconds
    meta_range_id:  string          -- Internal reference to committed data
    metadata:       map[string]string -- User-provided metadata
}

Related Pages

Implemented By

Implementation:Treeverse_LakeFS_Commit_Via_S3_Workflow

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment