Principle:Treeverse LakeFS S3 Path Mapping

Knowledge Sources	lakeFS lakeFS Documentation
Domains	S3_Compatibility, Data_Integration
Last Updated	2026-02-08 00:00 GMT

Overview

Mapping versioned data references to S3-compatible addressing through a path convention.

Description

lakeFS maps its versioning concepts to S3 conventions using a deterministic path-based scheme. The repository name becomes the S3 bucket, and the S3 object key is prefixed with the branch name (or commit ID) followed by the object path. This creates a natural and intuitive addressing scheme:

s3://{repository}/{branch}/{path/to/object}

This mapping is transparent to S3 clients -- they see a standard S3 bucket and interact with objects using normal S3 operations. The versioning semantics are encoded entirely in the path structure. No lakeFS-specific protocol extensions or custom headers are required for basic read/write operations.

Usage

Use this principle when:

Constructing S3 URIs for lakeFS-managed data
Understanding how lakeFS paths map to S3 bucket/key pairs
Configuring tools like Spark, Hive, or pandas to read from specific branches
Designing ETL pipelines that need to address data on different branches

Theoretical Basis

The path mapping follows a strict convention:

S3 Concept	lakeFS Concept	Example
Bucket name	Repository name	`my-repo`
Key prefix (first segment)	Branch name or commit ID	`main`, `feature-branch`, `a1b2c3d4`
Key remainder	Object path within repository	`data/file.csv`

Examples of the mapping:

S3 URI	Repository	Ref	Object Path
`s3://my-repo/main/data/file.csv`	my-repo	main	data/file.csv
`s3://my-repo/feature-branch/models/v1/model.pkl`	my-repo	feature-branch	models/v1/model.pkl
`s3://my-repo/a1b2c3d4/raw/events.parquet`	my-repo	a1b2c3d4 (commit)	raw/events.parquet
`s3://my-repo/main/`	my-repo	main	(list prefix)

Key rules:

The bucket name must match an existing lakeFS repository name exactly
The first path segment of the key must be a valid lakeFS reference (branch name, tag, or commit ID)
The remainder of the key is the object path within the lakeFS repository
Path-style addressing is mandatory; virtual-hosted-style is not supported
Listing with prefix {branch}/ returns all objects on that branch

Pseudocode for path decomposition:

function parse_s3_path(bucket, key):
    repository = bucket
    parts = key.split("/", maxsplit=1)
    ref = parts[0]           // branch, tag, or commit ID
    path = parts[1]          // object path within the repo
    return (repository, ref, path)

Related Pages

Implemented By

Implementation:Treeverse_LakeFS_S3_Path_Convention

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment