Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Treeverse LakeFS S3 GetObject

From Leeroopedia


Knowledge Sources
Domains S3_Compatibility, REST_API
Last Updated 2026-02-08 00:00 GMT

Overview

Wrapper for standard S3 read operations (GetObject, HeadObject, ListObjectsV2, presigned URLs) via the lakeFS S3 gateway.

Description

This implementation wraps S3 read operations that are translated by the lakeFS S3 gateway into lakeFS object retrieval operations. It covers the complete set of read-path interactions:

  • GetObject -- Retrieve object content and metadata
  • HeadObject / StatObject -- Retrieve metadata without body
  • ListObjectsV2 -- List objects under a branch prefix
  • Presigned URL generation -- Create time-limited download URLs

The gateway supports conditional requests (If-Match, If-None-Match based on ETag), range requests for partial reads, and presigned URLs for delegated access.

Usage

Use this implementation when:

  • Reading data from lakeFS through S3-compatible tools
  • Checking object existence or metadata without downloading content
  • Listing objects on a specific branch
  • Generating presigned URLs for temporary read access

Code Reference

Source Location

  • File: esti/s3_gateway_test.go
  • Lines: L516-691 (TestS3ReadObject)
  • File: esti/presign_test.go
  • Lines: L1-209 (presigned URL tests)

Signature

// Minio client: GetObject
res, err := minioClient.GetObject(ctx, repo, "main/exists", minio.GetObjectOptions{})
defer res.Close()
info, err := res.Stat()
content, err := io.ReadAll(res)

// Minio client: StatObject (HeadObject)
info, err := minioClient.StatObject(ctx, repo, "main/exists", minio.StatObjectOptions{})
// info.Size, info.ETag, info.ContentType, info.UserMetadata

// Minio client: Presigned GET URL
preSignedURL, err := minioClient.Presign(ctx, http.MethodGet, repo, "main/exists",
    time.Second*60, url.Values{})

// AWS SDK v2: GetObject
output, err := s3Client.GetObject(ctx, &s3.GetObjectInput{
    Bucket: aws.String(repo),
    Key:    aws.String("main/data/file.csv"),
})
defer output.Body.Close()
content, err := io.ReadAll(output.Body)

// AWS SDK v2: HeadObject
head, err := s3Client.HeadObject(ctx, &s3.HeadObjectInput{
    Bucket: aws.String(repo),
    Key:    aws.String("main/data/file.csv"),
})

// AWS SDK v2: ListObjectsV2
list, err := s3Client.ListObjectsV2(ctx, &s3.ListObjectsV2Input{
    Bucket: aws.String(repo),
    Prefix: aws.String("main/data/"),
})

Import

import boto3

s3 = boto3.client('s3',
    endpoint_url='http://localhost:8000',
    aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

I/O Contract

Inputs

Parameter Type Required Description
Bucket string Yes lakeFS repository name
Key string Yes Object key in format {branch}/{path}
If-Match string No Return object only if ETag matches (conditional read)
If-None-Match string No Return object only if ETag does not match (conditional read)
Range string No Byte range for partial read (e.g., bytes=0-1023)
Prefix string No Key prefix for ListObjectsV2 (e.g., main/data/)
MaxKeys integer No Maximum number of keys to return in list (default 1000)
ContinuationToken string No Pagination token for ListObjectsV2

Outputs

Output Type Description
Body byte stream Object content (GetObject only)
ETag string Entity tag (MD5 hash of content)
Content-Type string MIME type of the object
Content-Length integer Size in bytes
User metadata map[string]string User-defined metadata key-value pairs
Contents (list) array Array of object summaries (ListObjectsV2)
Presigned URL string Time-limited URL for direct download

Usage Examples

Python boto3: Get object content

import boto3

s3 = boto3.client('s3',
    endpoint_url='http://localhost:8000',
    aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
    aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)

# Download an object from the main branch
response = s3.get_object(Bucket='my-repo', Key='main/data/file.csv')
content = response['Body'].read().decode('utf-8')
print(content)

Python boto3: Head object (check existence and metadata)

# Check if an object exists and get its metadata
try:
    head = s3.head_object(Bucket='my-repo', Key='main/data/file.csv')
    print(f"Size: {head['ContentLength']} bytes")
    print(f"ETag: {head['ETag']}")
    print(f"Content-Type: {head['ContentType']}")
    if 'Metadata' in head:
        print(f"User metadata: {head['Metadata']}")
except s3.exceptions.ClientError as e:
    if e.response['Error']['Code'] == '404':
        print("Object does not exist")

Python boto3: List objects on a branch

# List all objects under main/data/
paginator = s3.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket='my-repo', Prefix='main/data/'):
    for obj in page.get('Contents', []):
        print(f"{obj['Key']}  ({obj['Size']} bytes)")

Python boto3: Generate presigned URL

# Generate a presigned URL valid for 1 hour
url = s3.generate_presigned_url(
    'get_object',
    Params={'Bucket': 'my-repo', 'Key': 'main/data/file.csv'},
    ExpiresIn=3600
)
print(f"Download URL: {url}")

AWS CLI: Download and list

# Download a file
aws --endpoint-url http://localhost:8000 s3 cp \
    s3://my-repo/main/data/file.csv ./file.csv

# List objects on a branch
aws --endpoint-url http://localhost:8000 s3 ls s3://my-repo/main/data/

# Get object metadata
aws --endpoint-url http://localhost:8000 s3api head-object \
    --bucket my-repo --key main/data/file.csv

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment