Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Treeverse LakeFS ListObjects For GC

From Leeroopedia


Knowledge Sources
Domains Storage_Management, REST_API
Last Updated 2026-02-08 00:00 GMT

Overview

The listObjects API endpoint, combined with presigned URL verification, serves as the primary mechanism for confirming garbage collection results by checking whether specific objects have been deleted from or retained in the underlying object storage.

Description

This implementation documents the use of the listObjects API from a GC verification angle. While listObjects is a general-purpose API for browsing repository contents, in the GC context it is used specifically to:

  1. List objects on a branch or commit reference with presign=true to obtain presigned physical storage URLs
  2. Issue HTTP GET requests against the presigned URLs to determine physical object existence
  3. Compare expected vs. actual state: objects that should be deleted should return HTTP 404; objects that should be retained should return HTTP 200

The verification helper function CheckFilesWereGarbageCollected in esti/esti_utils.go encapsulates this pattern for use in end-to-end tests.

Usage

Use this approach when:

  • Verifying that a GC run successfully deleted expired objects
  • Confirming that retained objects remain accessible after GC
  • Building automated GC verification into a pipeline
  • Debugging GC failures where objects were not deleted as expected

Code Reference

Source Location

  • API specification: api/swagger.yml lines 6059-6080
  • Operation ID: listObjects
  • HTTP method: GET
  • Path: /api/v1/repositories/{repository}/refs/{ref}/objects/ls
  • Verification helper: esti/esti_utils.go lines 672-694 (CheckFilesWereGarbageCollected)

Signature

# Response Schema: ObjectStatsList
ObjectStatsList:
  type: object
  required:
    - pagination
    - results
  properties:
    pagination:
      $ref: '#/definitions/Pagination'
    results:
      type: array
      items:
        $ref: '#/definitions/ObjectStats'

ObjectStats:
  type: object
  properties:
    path:
      type: string
      description: Logical path of the object in the repository
    physical_address:
      type: string
      description: >
        Physical storage URL. When presign=true, this is a presigned URL
        that can be used for direct HTTP access to verify object existence.
    size_bytes:
      type: integer
    mtime:
      type: integer
      description: Modification time (Unix epoch seconds)
    content_type:
      type: string

Import

# REST API — no import needed
# Use curl or any HTTP client for verification
curl -s http://localhost:8000/api/v1/repositories/{repository}/refs/{ref}/objects/ls?presign=true \
  -u "access_key:secret_key"

I/O Contract

Inputs

Parameter Location Type Required Description
repository Path string Yes The repository name
ref Path string Yes Branch name or commit ID to list objects from
presign Query boolean No (default: false) When true, physical_address fields contain presigned URLs for direct object storage access
prefix Query string No Filter results to objects whose path starts with this prefix
after Query string No Pagination cursor: return results after this path
amount Query integer No (default: 100) Maximum number of results to return per page

Outputs

Status Code Body Description
200 ObjectStatsList List of objects with their metadata and (optionally) presigned physical addresses
401 Error Unauthorized
404 Error Repository or ref not found

Presigned URL verification responses:

HTTP Status on Presigned URL Meaning
200 OK Object exists in physical storage (retained — not garbage collected)
404 Not Found Object does not exist in physical storage (successfully garbage collected)
403 Forbidden Presigned URL has expired; re-request with a fresh presigned URL

Usage Examples

Verify Object Was Garbage Collected

# Step 1: List objects with presigned URLs
RESPONSE=$(curl -s \
  "http://localhost:8000/api/v1/repositories/my-repo/refs/main/objects/ls?presign=true&prefix=data/expired/" \
  -u "AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY")

# Step 2: Extract a presigned URL for an object expected to be deleted
PRESIGNED_URL=$(echo "$RESPONSE" | jq -r '.results[0].physical_address')

# Step 3: Check if the object still exists in physical storage
HTTP_STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$PRESIGNED_URL")

if [ "$HTTP_STATUS" = "404" ]; then
  echo "SUCCESS: Object was garbage collected"
elif [ "$HTTP_STATUS" = "200" ]; then
  echo "FAILURE: Object still exists — GC may not have run correctly"
else
  echo "UNEXPECTED: HTTP status $HTTP_STATUS"
fi

Batch Verification Script

#!/bin/bash
# Verify multiple objects after GC run
LAKEFS_URL="http://localhost:8000/api/v1"
REPO="my-repo"
BRANCH="main"
CREDS="AKIAIOSFODNN7EXAMPLE:wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

# Get all objects with presigned URLs
OBJECTS=$(curl -s \
  "${LAKEFS_URL}/repositories/${REPO}/refs/${BRANCH}/objects/ls?presign=true&amount=1000" \
  -u "$CREDS")

# Check each presigned URL
echo "$OBJECTS" | jq -r '.results[] | "\(.path)\t\(.physical_address)"' | \
while IFS=$'\t' read -r path url; do
  status=$(curl -s -o /dev/null -w "%{http_code}" "$url")
  echo "$path -> HTTP $status"
done

Python Verification Example

import requests
import lakefs_sdk

configuration = lakefs_sdk.Configuration(
    host="http://localhost:8000/api/v1",
    username="AKIAIOSFODNN7EXAMPLE",
    password="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
)

with lakefs_sdk.ApiClient(configuration) as api_client:
    api = lakefs_sdk.ObjectsApi(api_client)

    # List objects with presigned URLs
    result = api.list_objects(
        repository="my-repo",
        ref="main",
        presign=True,
        prefix="data/",
    )

    for obj in result.results:
        # Check physical existence via presigned URL
        resp = requests.get(obj.physical_address)
        status = "EXISTS" if resp.status_code == 200 else "DELETED"
        print(f"{obj.path}: {status} (HTTP {resp.status_code})")

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment