Principle:Treeverse LakeFS S3 Commit Management
| Knowledge Sources | |
|---|---|
| Domains | S3_Compatibility, Data_Integration |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Committing S3-staged changes through the lakeFS REST API to create versioned snapshots.
Description
Changes written through the S3 gateway are automatically staged but not committed. To create a version snapshot, users must call the lakeFS REST API commit endpoint. This separation of "write" (S3 protocol) and "version" (lakeFS API) operations is a fundamental design principle of the lakeFS S3 gateway integration.
The commit operation:
- Takes all staged changes on a branch (uploads, deletes, copies made via S3 or lakeFS API)
- Creates a single atomic commit with a message and optional metadata
- Returns a commit object with a unique ID, timestamp, and parent references
- Makes the committed state visible as the new head of the branch
This is the bridge between the S3 protocol world and the lakeFS versioning world.
Usage
Use this principle when:
- Completing an S3-based data ingestion workflow with a commit
- Understanding the two-phase nature of lakeFS writes (stage via S3, commit via API)
- Designing ETL pipelines that write via S3 and need version control
- Building automation that writes data through S3 tools and then commits via the REST API
Theoretical Basis
The commit model enforces a clear separation of concerns:
S3 Protocol Layer lakeFS Versioning Layer
================== =======================
PutObject ----\
CopyObject -----+---> Staging Area ---> Commit (REST API) ---> Branch History
DeleteObject ----/ | |
| v
(uncommitted) (immutable snapshot)
Why this separation matters:
- Atomicity: Multiple S3 writes can be grouped into a single atomic commit, ensuring consumers see a consistent state
- Isolation: Uncommitted changes on one branch do not affect other branches or consumers reading committed data
- Auditability: Every commit has a message, timestamp, committer, and optional metadata, creating a full audit trail
- Tool compatibility: S3-compatible tools do not need to know about commits; they write data using standard S3 operations
The commit workflow:
| Step | Protocol | Operation | Description |
|---|---|---|---|
| 1 | S3 | PutObject / CopyObject / DeleteObject | Write changes; they are automatically staged |
| 2 | REST API | POST /repositories/{repo}/branches/{branch}/commits | Commit all staged changes atomically |
| 3 | S3 or REST | GetObject or list commits | Read the committed data or inspect the commit history |
Commit request schema (CommitCreation):
CommitCreation {
message: string (required) -- Human-readable commit message
metadata: map[string]string (optional) -- Arbitrary key-value pairs for automation
date: integer (optional) -- Override creation date (Unix Epoch in seconds)
allow_empty: boolean (optional, default: false) -- Allow commits with no changes
force: boolean (optional, default: false) -- Force commit
}
Commit response schema (Commit):
Commit {
id: string -- Unique commit identifier
parents: []string -- Parent commit IDs
committer: string -- Who created the commit
message: string -- Commit message
creation_date: integer -- Unix Epoch in seconds
meta_range_id: string -- Internal reference to committed data
metadata: map[string]string -- User-provided metadata
}