Principle:Treeverse LakeFS S3 Path Mapping
| Knowledge Sources | |
|---|---|
| Domains | S3_Compatibility, Data_Integration |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Mapping versioned data references to S3-compatible addressing through a path convention.
Description
lakeFS maps its versioning concepts to S3 conventions using a deterministic path-based scheme. The repository name becomes the S3 bucket, and the S3 object key is prefixed with the branch name (or commit ID) followed by the object path. This creates a natural and intuitive addressing scheme:
s3://{repository}/{branch}/{path/to/object}
This mapping is transparent to S3 clients -- they see a standard S3 bucket and interact with objects using normal S3 operations. The versioning semantics are encoded entirely in the path structure. No lakeFS-specific protocol extensions or custom headers are required for basic read/write operations.
Usage
Use this principle when:
- Constructing S3 URIs for lakeFS-managed data
- Understanding how lakeFS paths map to S3 bucket/key pairs
- Configuring tools like Spark, Hive, or pandas to read from specific branches
- Designing ETL pipelines that need to address data on different branches
Theoretical Basis
The path mapping follows a strict convention:
| S3 Concept | lakeFS Concept | Example |
|---|---|---|
| Bucket name | Repository name | my-repo
|
| Key prefix (first segment) | Branch name or commit ID | main, feature-branch, a1b2c3d4
|
| Key remainder | Object path within repository | data/file.csv
|
Examples of the mapping:
| S3 URI | Repository | Ref | Object Path |
|---|---|---|---|
s3://my-repo/main/data/file.csv |
my-repo | main | data/file.csv |
s3://my-repo/feature-branch/models/v1/model.pkl |
my-repo | feature-branch | models/v1/model.pkl |
s3://my-repo/a1b2c3d4/raw/events.parquet |
my-repo | a1b2c3d4 (commit) | raw/events.parquet |
s3://my-repo/main/ |
my-repo | main | (list prefix) |
Key rules:
- The bucket name must match an existing lakeFS repository name exactly
- The first path segment of the key must be a valid lakeFS reference (branch name, tag, or commit ID)
- The remainder of the key is the object path within the lakeFS repository
- Path-style addressing is mandatory; virtual-hosted-style is not supported
- Listing with prefix
{branch}/returns all objects on that branch
Pseudocode for path decomposition:
function parse_s3_path(bucket, key):
repository = bucket
parts = key.split("/", maxsplit=1)
ref = parts[0] // branch, tag, or commit ID
path = parts[1] // object path within the repo
return (repository, ref, path)