Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Treeverse LakeFS S3 Path Convention

From Leeroopedia


Knowledge Sources
Domains S3_Compatibility, REST_API
Last Updated 2026-02-08 00:00 GMT

Overview

Pattern for S3 path addressing where the S3 bucket maps to a lakeFS repository and the object key encodes the branch and object path.

Description

This implementation documents the S3 path convention used throughout the lakeFS S3 gateway integration tests and client code. The pattern is:

  • Bucket = lakeFS repository name
  • Key = {branch}/{path}

The test TestS3UploadAndDownload (lines 126-197) demonstrates this pattern by uploading objects with keys prefixed by main/data/ and then downloading them using the same key structure.

Usage

Use this pattern when:

  • Constructing S3 object keys for lakeFS operations
  • Writing data pipeline code that addresses versioned objects
  • Testing S3 gateway compatibility with lakeFS

Code Reference

Source Location

  • File: esti/s3_gateway_test.go
  • Lines: L126-197 (TestS3UploadAndDownload)
  • Constants: gatewayTestPrefix = mainBranch + "/data/" (L44)

Signature

// The path convention is defined by constant and usage patterns:
const (
    mainBranch        = "main"
    gatewayTestPrefix = mainBranch + "/data/"
)

// Upload: bucket = repo, key = "main/data/{random_path}"
_, err := clt.PutObject(ctx, repo, gatewayTestPrefix + randomPath, reader, size, minio.PutObjectOptions{})

// Download: same bucket and key
download, err := clt.GetObject(ctx, repo, gatewayTestPrefix + randomPath, minio.GetObjectOptions{})

Import

import boto3

s3 = boto3.client('s3',
    endpoint_url='http://localhost:8000',
    aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

I/O Contract

Inputs

Parameter Type Description
bucket string The lakeFS repository name (e.g., my-repo)
key string Composite key in format {branch}/{path} (e.g., main/data/file.csv)

Outputs

Component Resolved Value Description
Repository Value of bucket The target lakeFS repository
Ref First segment of key Branch name, tag, or commit ID
Path Remainder of key Object path within the repository

Usage Examples

Python boto3: Upload and Download

import boto3

s3 = boto3.client('s3',
    endpoint_url='http://localhost:8000',
    aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

# Upload: bucket = repository, key = branch/path
s3.put_object(
    Bucket='my-repo',
    Key='main/data/file.csv',
    Body=b'col1,col2\nval1,val2\n'
)

# Download: same addressing
response = s3.get_object(Bucket='my-repo', Key='main/data/file.csv')
content = response['Body'].read()

AWS CLI: List and Copy

# List all objects on the main branch
aws --endpoint-url http://localhost:8000 s3 ls s3://my-repo/main/

# Copy a file to a feature branch
aws --endpoint-url http://localhost:8000 s3 cp \
    s3://my-repo/main/data/file.csv \
    s3://my-repo/feature-branch/data/file.csv

Spark: Read from a specific branch

spark = SparkSession.builder \
    .config("spark.hadoop.fs.s3a.endpoint", "http://localhost:8000") \
    .config("spark.hadoop.fs.s3a.access.key", "AKIAIOSFDNN7EXAMPLEQ") \
    .config("spark.hadoop.fs.s3a.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY") \
    .config("spark.hadoop.fs.s3a.path.style.access", "true") \
    .getOrCreate()

# Read from main branch: s3a://{repo}/{branch}/{path}
df = spark.read.parquet("s3a://my-repo/main/data/")

# Read from feature branch
df_feature = spark.read.parquet("s3a://my-repo/feature-branch/data/")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment