Implementation:Treeverse LakeFS S3 Path Convention

Knowledge Sources	lakeFS lakeFS S3 Gateway Docs
Domains	S3_Compatibility, REST_API
Last Updated	2026-02-08 00:00 GMT

Overview

Pattern for S3 path addressing where the S3 bucket maps to a lakeFS repository and the object key encodes the branch and object path.

Description

This implementation documents the S3 path convention used throughout the lakeFS S3 gateway integration tests and client code. The pattern is:

Bucket = lakeFS repository name
Key = {branch}/{path}

The test TestS3UploadAndDownload (lines 126-197) demonstrates this pattern by uploading objects with keys prefixed by main/data/ and then downloading them using the same key structure.

Usage

Use this pattern when:

Constructing S3 object keys for lakeFS operations
Writing data pipeline code that addresses versioned objects
Testing S3 gateway compatibility with lakeFS

Code Reference

Source Location

File: esti/s3_gateway_test.go
Lines: L126-197 (TestS3UploadAndDownload)
Constants: gatewayTestPrefix = mainBranch + "/data/" (L44)

Signature

// The path convention is defined by constant and usage patterns:
const (
    mainBranch        = "main"
    gatewayTestPrefix = mainBranch + "/data/"
)

// Upload: bucket = repo, key = "main/data/{random_path}"
_, err := clt.PutObject(ctx, repo, gatewayTestPrefix + randomPath, reader, size, minio.PutObjectOptions{})

// Download: same bucket and key
download, err := clt.GetObject(ctx, repo, gatewayTestPrefix + randomPath, minio.GetObjectOptions{})

Import

import boto3

s3 = boto3.client('s3',
    endpoint_url='http://localhost:8000',
    aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

I/O Contract

Inputs

Parameter	Type	Description
`bucket`	string	The lakeFS repository name (e.g., `my-repo`)
`key`	string	Composite key in format `{branch}/{path}` (e.g., `main/data/file.csv`)

Outputs

Component	Resolved Value	Description
Repository	Value of `bucket`	The target lakeFS repository
Ref	First segment of `key`	Branch name, tag, or commit ID
Path	Remainder of `key`	Object path within the repository

Usage Examples

Python boto3: Upload and Download

import boto3

s3 = boto3.client('s3',
    endpoint_url='http://localhost:8000',
    aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

# Upload: bucket = repository, key = branch/path
s3.put_object(
    Bucket='my-repo',
    Key='main/data/file.csv',
    Body=b'col1,col2\nval1,val2\n'
)

# Download: same addressing
response = s3.get_object(Bucket='my-repo', Key='main/data/file.csv')
content = response['Body'].read()

AWS CLI: List and Copy

# List all objects on the main branch
aws --endpoint-url http://localhost:8000 s3 ls s3://my-repo/main/

# Copy a file to a feature branch
aws --endpoint-url http://localhost:8000 s3 cp \
    s3://my-repo/main/data/file.csv \
    s3://my-repo/feature-branch/data/file.csv

Spark: Read from a specific branch

spark = SparkSession.builder \
    .config("spark.hadoop.fs.s3a.endpoint", "http://localhost:8000") \
    .config("spark.hadoop.fs.s3a.access.key", "AKIAIOSFDNN7EXAMPLEQ") \
    .config("spark.hadoop.fs.s3a.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY") \
    .config("spark.hadoop.fs.s3a.path.style.access", "true") \
    .getOrCreate()

# Read from main branch: s3a://{repo}/{branch}/{path}
df = spark.read.parquet("s3a://my-repo/main/data/")

# Read from feature branch
df_feature = spark.read.parquet("s3a://my-repo/feature-branch/data/")

Related Pages

Implements Principle

Principle:Treeverse_LakeFS_S3_Path_Mapping

Requires Environment

Environment:Treeverse_LakeFS_S3_Gateway_Test_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment