Implementation:Treeverse LakeFS ListObjects For GC
| Knowledge Sources | |
|---|---|
| Domains | Storage_Management, REST_API |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
The listObjects API endpoint, combined with presigned URL verification, serves as the primary mechanism for confirming garbage collection results by checking whether specific objects have been deleted from or retained in the underlying object storage.
Description
This implementation documents the use of the listObjects API from a GC verification angle. While listObjects is a general-purpose API for browsing repository contents, in the GC context it is used specifically to:
- List objects on a branch or commit reference with
presign=trueto obtain presigned physical storage URLs - Issue HTTP GET requests against the presigned URLs to determine physical object existence
- Compare expected vs. actual state: objects that should be deleted should return HTTP 404; objects that should be retained should return HTTP 200
The verification helper function CheckFilesWereGarbageCollected in esti/esti_utils.go encapsulates this pattern for use in end-to-end tests.
Usage
Use this approach when:
- Verifying that a GC run successfully deleted expired objects
- Confirming that retained objects remain accessible after GC
- Building automated GC verification into a pipeline
- Debugging GC failures where objects were not deleted as expected
Code Reference
Source Location
- API specification:
api/swagger.ymllines 6059-6080 - Operation ID:
listObjects - HTTP method:
GET - Path:
/api/v1/repositories/{repository}/refs/{ref}/objects/ls - Verification helper:
esti/esti_utils.golines 672-694 (CheckFilesWereGarbageCollected)
Signature
# Response Schema: ObjectStatsList
ObjectStatsList:
type: object
required:
- pagination
- results
properties:
pagination:
$ref: '#/definitions/Pagination'
results:
type: array
items:
$ref: '#/definitions/ObjectStats'
ObjectStats:
type: object
properties:
path:
type: string
description: Logical path of the object in the repository
physical_address:
type: string
description: >
Physical storage URL. When presign=true, this is a presigned URL
that can be used for direct HTTP access to verify object existence.
size_bytes:
type: integer
mtime:
type: integer
description: Modification time (Unix epoch seconds)
content_type:
type: string
Import
# REST API — no import needed
# Use curl or any HTTP client for verification
curl -s http://localhost:8000/api/v1/repositories/{repository}/refs/{ref}/objects/ls?presign=true \
-u "access_key:secret_key"
I/O Contract
Inputs
| Parameter | Location | Type | Required | Description |
|---|---|---|---|---|
repository |
Path | string | Yes | The repository name |
ref |
Path | string | Yes | Branch name or commit ID to list objects from |
presign |
Query | boolean | No (default: false) | When true, physical_address fields contain presigned URLs for direct object storage access
|
prefix |
Query | string | No | Filter results to objects whose path starts with this prefix |
after |
Query | string | No | Pagination cursor: return results after this path |
amount |
Query | integer | No (default: 100) | Maximum number of results to return per page |
Outputs
| Status Code | Body | Description |
|---|---|---|
| 200 | ObjectStatsList | List of objects with their metadata and (optionally) presigned physical addresses |
| 401 | Error | Unauthorized |
| 404 | Error | Repository or ref not found |
Presigned URL verification responses:
| HTTP Status on Presigned URL | Meaning |
|---|---|
| 200 OK | Object exists in physical storage (retained — not garbage collected) |
| 404 Not Found | Object does not exist in physical storage (successfully garbage collected) |
| 403 Forbidden | Presigned URL has expired; re-request with a fresh presigned URL |
Usage Examples
Verify Object Was Garbage Collected
# Step 1: List objects with presigned URLs
RESPONSE=$(curl -s \
"http://localhost:8000/api/v1/repositories/my-repo/refs/main/objects/ls?presign=true&prefix=data/expired/" \
-u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")
# Step 2: Extract a presigned URL for an object expected to be deleted
PRESIGNED_URL=$(echo "$RESPONSE" | jq -r '.results[0].physical_address')
# Step 3: Check if the object still exists in physical storage
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$PRESIGNED_URL")
if [ "$HTTP_STATUS" = "404" ]; then
echo "SUCCESS: Object was garbage collected"
elif [ "$HTTP_STATUS" = "200" ]; then
echo "FAILURE: Object still exists — GC may not have run correctly"
else
echo "UNEXPECTED: HTTP status $HTTP_STATUS"
fi
Batch Verification Script
#!/bin/bash
# Verify multiple objects after GC run
LAKEFS_URL="http://localhost:8000/api/v1"
REPO="my-repo"
BRANCH="main"
CREDS="AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"
# Get all objects with presigned URLs
OBJECTS=$(curl -s \
"${LAKEFS_URL}/repositories/${REPO}/refs/${BRANCH}/objects/ls?presign=true&amount=1000" \
-u "$CREDS")
# Check each presigned URL
echo "$OBJECTS" | jq -r '.results[] | "\(.path)\t\(.physical_address)"' | \
while IFS=$'\t' read -r path url; do
status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
echo "$path -> HTTP $status"
done
Python Verification Example
import requests
import lakefs_sdk
configuration = lakefs_sdk.Configuration(
host="http://localhost:8000/api/v1",
username="AKIAIOSFODNN7EXAMPLE",
password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
)
with lakefs_sdk.ApiClient(configuration) as api_client:
api = lakefs_sdk.ObjectsApi(api_client)
# List objects with presigned URLs
result = api.list_objects(
repository="my-repo",
ref="main",
presign=True,
prefix="data/",
)
for obj in result.results:
# Check physical existence via presigned URL
resp = requests.get(obj.physical_address)
status = "EXISTS" if resp.status_code == 200 else "DELETED"
print(f"{obj.path}: {status} (HTTP {resp.status_code})")