Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Treeverse LakeFS S3 Path Mapping

From Leeroopedia


Knowledge Sources
Domains S3_Compatibility, Data_Integration
Last Updated 2026-02-08 00:00 GMT

Overview

Mapping versioned data references to S3-compatible addressing through a path convention.

Description

lakeFS maps its versioning concepts to S3 conventions using a deterministic path-based scheme. The repository name becomes the S3 bucket, and the S3 object key is prefixed with the branch name (or commit ID) followed by the object path. This creates a natural and intuitive addressing scheme:

s3://{repository}/{branch}/{path/to/object}

This mapping is transparent to S3 clients -- they see a standard S3 bucket and interact with objects using normal S3 operations. The versioning semantics are encoded entirely in the path structure. No lakeFS-specific protocol extensions or custom headers are required for basic read/write operations.

Usage

Use this principle when:

  • Constructing S3 URIs for lakeFS-managed data
  • Understanding how lakeFS paths map to S3 bucket/key pairs
  • Configuring tools like Spark, Hive, or pandas to read from specific branches
  • Designing ETL pipelines that need to address data on different branches

Theoretical Basis

The path mapping follows a strict convention:

S3 Concept lakeFS Concept Example
Bucket name Repository name my-repo
Key prefix (first segment) Branch name or commit ID main, feature-branch, a1b2c3d4
Key remainder Object path within repository data/file.csv

Examples of the mapping:

S3 URI Repository Ref Object Path
s3://my-repo/main/data/file.csv my-repo main data/file.csv
s3://my-repo/feature-branch/models/v1/model.pkl my-repo feature-branch models/v1/model.pkl
s3://my-repo/a1b2c3d4/raw/events.parquet my-repo a1b2c3d4 (commit) raw/events.parquet
s3://my-repo/main/ my-repo main (list prefix)

Key rules:

  1. The bucket name must match an existing lakeFS repository name exactly
  2. The first path segment of the key must be a valid lakeFS reference (branch name, tag, or commit ID)
  3. The remainder of the key is the object path within the lakeFS repository
  4. Path-style addressing is mandatory; virtual-hosted-style is not supported
  5. Listing with prefix {branch}/ returns all objects on that branch

Pseudocode for path decomposition:

function parse_s3_path(bucket, key):
    repository = bucket
    parts = key.split("/", maxsplit=1)
    ref = parts[0]           // branch, tag, or commit ID
    path = parts[1]          // object path within the repo
    return (repository, ref, path)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment