Implementation:Treeverse LakeFS S3 Path Convention
Appearance
| Knowledge Sources | |
|---|---|
| Domains | S3_Compatibility, REST_API |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Pattern for S3 path addressing where the S3 bucket maps to a lakeFS repository and the object key encodes the branch and object path.
Description
This implementation documents the S3 path convention used throughout the lakeFS S3 gateway integration tests and client code. The pattern is:
- Bucket = lakeFS repository name
- Key =
{branch}/{path}
The test TestS3UploadAndDownload (lines 126-197) demonstrates this pattern by uploading objects with keys prefixed by main/data/ and then downloading them using the same key structure.
Usage
Use this pattern when:
- Constructing S3 object keys for lakeFS operations
- Writing data pipeline code that addresses versioned objects
- Testing S3 gateway compatibility with lakeFS
Code Reference
Source Location
- File:
esti/s3_gateway_test.go - Lines: L126-197 (
TestS3UploadAndDownload) - Constants:
gatewayTestPrefix = mainBranch + "/data/"(L44)
Signature
// The path convention is defined by constant and usage patterns:
const (
mainBranch = "main"
gatewayTestPrefix = mainBranch + "/data/"
)
// Upload: bucket = repo, key = "main/data/{random_path}"
_, err := clt.PutObject(ctx, repo, gatewayTestPrefix + randomPath, reader, size, minio.PutObjectOptions{})
// Download: same bucket and key
download, err := clt.GetObject(ctx, repo, gatewayTestPrefix + randomPath, minio.GetObjectOptions{})
Import
import boto3
s3 = boto3.client('s3',
endpoint_url='http://localhost:8000',
aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
bucket |
string | The lakeFS repository name (e.g., my-repo)
|
key |
string | Composite key in format {branch}/{path} (e.g., main/data/file.csv)
|
Outputs
| Component | Resolved Value | Description |
|---|---|---|
| Repository | Value of bucket |
The target lakeFS repository |
| Ref | First segment of key |
Branch name, tag, or commit ID |
| Path | Remainder of key |
Object path within the repository |
Usage Examples
Python boto3: Upload and Download
import boto3
s3 = boto3.client('s3',
endpoint_url='http://localhost:8000',
aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)
# Upload: bucket = repository, key = branch/path
s3.put_object(
Bucket='my-repo',
Key='main/data/file.csv',
Body=b'col1,col2\nval1,val2\n'
)
# Download: same addressing
response = s3.get_object(Bucket='my-repo', Key='main/data/file.csv')
content = response['Body'].read()
AWS CLI: List and Copy
# List all objects on the main branch
aws --endpoint-url http://localhost:8000 s3 ls s3://my-repo/main/
# Copy a file to a feature branch
aws --endpoint-url http://localhost:8000 s3 cp \
s3://my-repo/main/data/file.csv \
s3://my-repo/feature-branch/data/file.csv
Spark: Read from a specific branch
spark = SparkSession.builder \
.config("spark.hadoop.fs.s3a.endpoint", "http://localhost:8000") \
.config("spark.hadoop.fs.s3a.access.key", "AKIAIOSFDNN7EXAMPLEQ") \
.config("spark.hadoop.fs.s3a.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY") \
.config("spark.hadoop.fs.s3a.path.style.access", "true") \
.getOrCreate()
# Read from main branch: s3a://{repo}/{branch}/{path}
df = spark.read.parquet("s3a://my-repo/main/data/")
# Read from feature branch
df_feature = spark.read.parquet("s3a://my-repo/feature-branch/data/")
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment