Implementation:Treeverse LakeFS S3 GetObject
Appearance
| Knowledge Sources | |
|---|---|
| Domains | S3_Compatibility, REST_API |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Wrapper for standard S3 read operations (GetObject, HeadObject, ListObjectsV2, presigned URLs) via the lakeFS S3 gateway.
Description
This implementation wraps S3 read operations that are translated by the lakeFS S3 gateway into lakeFS object retrieval operations. It covers the complete set of read-path interactions:
- GetObject -- Retrieve object content and metadata
- HeadObject / StatObject -- Retrieve metadata without body
- ListObjectsV2 -- List objects under a branch prefix
- Presigned URL generation -- Create time-limited download URLs
The gateway supports conditional requests (If-Match, If-None-Match based on ETag), range requests for partial reads, and presigned URLs for delegated access.
Usage
Use this implementation when:
- Reading data from lakeFS through S3-compatible tools
- Checking object existence or metadata without downloading content
- Listing objects on a specific branch
- Generating presigned URLs for temporary read access
Code Reference
Source Location
- File:
esti/s3_gateway_test.go - Lines: L516-691 (
TestS3ReadObject) - File:
esti/presign_test.go - Lines: L1-209 (presigned URL tests)
Signature
// Minio client: GetObject
res, err := minioClient.GetObject(ctx, repo, "main/exists", minio.GetObjectOptions{})
defer res.Close()
info, err := res.Stat()
content, err := io.ReadAll(res)
// Minio client: StatObject (HeadObject)
info, err := minioClient.StatObject(ctx, repo, "main/exists", minio.StatObjectOptions{})
// info.Size, info.ETag, info.ContentType, info.UserMetadata
// Minio client: Presigned GET URL
preSignedURL, err := minioClient.Presign(ctx, http.MethodGet, repo, "main/exists",
time.Second*60, url.Values{})
// AWS SDK v2: GetObject
output, err := s3Client.GetObject(ctx, &s3.GetObjectInput{
Bucket: aws.String(repo),
Key: aws.String("main/data/file.csv"),
})
defer output.Body.Close()
content, err := io.ReadAll(output.Body)
// AWS SDK v2: HeadObject
head, err := s3Client.HeadObject(ctx, &s3.HeadObjectInput{
Bucket: aws.String(repo),
Key: aws.String("main/data/file.csv"),
})
// AWS SDK v2: ListObjectsV2
list, err := s3Client.ListObjectsV2(ctx, &s3.ListObjectsV2Input{
Bucket: aws.String(repo),
Prefix: aws.String("main/data/"),
})
Import
import boto3
s3 = boto3.client('s3',
endpoint_url='http://localhost:8000',
aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
Bucket |
string | Yes | lakeFS repository name |
Key |
string | Yes | Object key in format {branch}/{path}
|
If-Match |
string | No | Return object only if ETag matches (conditional read) |
If-None-Match |
string | No | Return object only if ETag does not match (conditional read) |
Range |
string | No | Byte range for partial read (e.g., bytes=0-1023)
|
Prefix |
string | No | Key prefix for ListObjectsV2 (e.g., main/data/)
|
MaxKeys |
integer | No | Maximum number of keys to return in list (default 1000) |
ContinuationToken |
string | No | Pagination token for ListObjectsV2 |
Outputs
| Output | Type | Description |
|---|---|---|
| Body | byte stream | Object content (GetObject only) |
| ETag | string | Entity tag (MD5 hash of content) |
| Content-Type | string | MIME type of the object |
| Content-Length | integer | Size in bytes |
| User metadata | map[string]string | User-defined metadata key-value pairs |
| Contents (list) | array | Array of object summaries (ListObjectsV2) |
| Presigned URL | string | Time-limited URL for direct download |
Usage Examples
Python boto3: Get object content
import boto3
s3 = boto3.client('s3',
endpoint_url='http://localhost:8000',
aws_access_key_id='AKIAIOSFDNN7EXAMPLEQ',
aws_secret_access_key='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
)
# Download an object from the main branch
response = s3.get_object(Bucket='my-repo', Key='main/data/file.csv')
content = response['Body'].read().decode('utf-8')
print(content)
Python boto3: Head object (check existence and metadata)
# Check if an object exists and get its metadata
try:
head = s3.head_object(Bucket='my-repo', Key='main/data/file.csv')
print(f"Size: {head['ContentLength']} bytes")
print(f"ETag: {head['ETag']}")
print(f"Content-Type: {head['ContentType']}")
if 'Metadata' in head:
print(f"User metadata: {head['Metadata']}")
except s3.exceptions.ClientError as e:
if e.response['Error']['Code'] == '404':
print("Object does not exist")
Python boto3: List objects on a branch
# List all objects under main/data/
paginator = s3.get_paginator('list_objects_v2')
for page in paginator.paginate(Bucket='my-repo', Prefix='main/data/'):
for obj in page.get('Contents', []):
print(f"{obj['Key']} ({obj['Size']} bytes)")
Python boto3: Generate presigned URL
# Generate a presigned URL valid for 1 hour
url = s3.generate_presigned_url(
'get_object',
Params={'Bucket': 'my-repo', 'Key': 'main/data/file.csv'},
ExpiresIn=3600
)
print(f"Download URL: {url}")
AWS CLI: Download and list
# Download a file
aws --endpoint-url http://localhost:8000 s3 cp \
s3://my-repo/main/data/file.csv ./file.csv
# List objects on a branch
aws --endpoint-url http://localhost:8000 s3 ls s3://my-repo/main/data/
# Get object metadata
aws --endpoint-url http://localhost:8000 s3api head-object \
--bucket my-repo --key main/data/file.csv
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment