Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Treeverse LakeFS S3 Client Setup

From Leeroopedia


Knowledge Sources
Domains S3_Compatibility, REST_API
Last Updated 2026-02-08 00:00 GMT

Overview

Wrapper for AWS SDK and Minio client initialization configured to communicate with the lakeFS S3 gateway.

Description

This implementation wraps the initialization of standard S3 client libraries (AWS SDK for Go v2, Minio Go client) so that they point at the lakeFS S3 gateway instead of AWS S3. The critical configuration parameters are:

  • Endpoint: The lakeFS S3 gateway URL (same host as the lakeFS API, typically port 8000)
  • Force path-style: Must be true (lakeFS does not support virtual-hosted-style bucket addressing)
  • Credentials: lakeFS access key ID and secret access key

External dependencies:

  • github.com/minio/minio-go/v7 -- Minio Go client library
  • github.com/aws/aws-sdk-go-v2/service/s3 -- AWS SDK for Go v2

Usage

Use this implementation when:

  • Setting up an S3 client in application code (Python boto3, Go AWS SDK, Java AWS SDK)
  • Configuring a Minio client to interact with lakeFS
  • Initializing Spark, Hive, or Presto with S3A filesystem settings pointing at lakeFS

Code Reference

Source Location

  • File: esti/s3_gateway_test.go
  • Lines: L47-71
  • Functions: newMinioClient (L47), createS3Client (L65)

Signature

// newMinioClient creates a Minio client configured for the lakeFS S3 gateway.
// getCredentials selects the signing method (V2 or V4).
func newMinioClient(t *testing.T, getCredentials GetCredentials) *minio.Client {
    accessKeyID := viper.GetString("access_key_id")
    secretAccessKey := viper.GetString("secret_access_key")
    endpoint := viper.GetString("s3_endpoint")
    endpointSecure := viper.GetBool("s3_endpoint_secure")
    creds := getCredentials(accessKeyID, secretAccessKey, "")
    clt, err := minio.New(endpoint, &minio.Options{
        Creds:  creds,
        Secure: endpointSecure,
    })
    if err != nil {
        t.Fatalf("minio.New: %s", err)
    }
    return clt
}

// createS3Client creates an AWS SDK v2 S3 client configured for lakeFS.
func createS3Client(endpoint string, t *testing.T) *s3.Client {
    accessKeyID := viper.GetString("access_key_id")
    secretAccessKey := viper.GetString("secret_access_key")
    s3Client, err := testutil.SetupTestS3Client(endpoint, accessKeyID, secretAccessKey, true)
    require.NoError(t, err, "failed creating s3 client")
    return s3Client
}

Import

import boto3

# Python boto3 client pointing at the lakeFS S3 gateway
s3 = boto3.client(
    's3',
    endpoint_url='http://localhost:8000',
    aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

I/O Contract

Inputs

Parameter Type Required Description
endpoint_url string Yes The lakeFS S3 gateway URL (e.g., http://localhost:8000)
aws_access_key_id string Yes lakeFS access key ID
aws_secret_access_key string Yes lakeFS secret access key
force_path_style boolean Yes Must be true for lakeFS
region string No Any valid region string (lakeFS ignores this); defaults to us-east-1

Outputs

Output Type Description
S3 client instance Client object Configured S3 client ready for use with lakeFS

Usage Examples

Python boto3: Client setup

import boto3

s3 = boto3.client(
    's3',
    endpoint_url='http://localhost:8000',
    aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

# Verify the connection by listing repositories (buckets)
response = s3.list_buckets()
for bucket in response['Buckets']:
    print(bucket['Name'])

Python boto3: Resource-style setup

import boto3

s3_resource = boto3.resource(
    's3',
    endpoint_url='http://localhost:8000',
    aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

bucket = s3_resource.Bucket('my-repo')
for obj in bucket.objects.filter(Prefix='main/'):
    print(obj.key)

Spark: S3A filesystem configuration

from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .config("spark.hadoop.fs.s3a.endpoint", "http://localhost:8000") \
    .config("spark.hadoop.fs.s3a.access.key", "AKIAIOSFDNN7EXAMPLEQ") \
    .config("spark.hadoop.fs.s3a.secret.key", "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY") \
    .config("spark.hadoop.fs.s3a.path.style.access", "true") \
    .getOrCreate()

# Now Spark can read from lakeFS branches
df = spark.read.parquet("s3a://my-repo/main/data/")

AWS CLI: Configuration

# Configure AWS CLI profile for lakeFS
aws configure --profile lakefs
# AWS Access Key ID: AKIAIOSFDNN7EXAMPLEQ
# AWS Secret Access Key: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
# Default region name: us-east-1

# Use the profile with the lakeFS endpoint
aws --endpoint-url http://localhost:8000 --profile lakefs s3 ls s3://my-repo/main/

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment