Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Treeverse LakeFS ListObjects For Import

From Leeroopedia


Knowledge Sources
Domains Data_Import, REST_API
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete API endpoint for listing objects under a given reference, used here to verify that imported data is correctly present at expected destination paths after an import operation, provided by the lakeFS REST API.

Description

The listObjects endpoint returns a paginated list of objects (files) under a given branch or commit reference. When used for import verification, it is called with a prefix filter matching the import destination path. This allows the client to enumerate all imported objects and confirm their presence, count, metadata, and path structure.

Key capabilities for import verification:

  • Prefix filtering -- The prefix query parameter narrows results to only objects under the import destination, making verification efficient even when the branch contains millions of objects from prior operations
  • Pagination -- Results are paginated via after and amount parameters, allowing verification of arbitrarily large import sets
  • Delimiter support -- The delimiter parameter enables directory-style listing to verify the hierarchical structure of imported data
  • Object metadata -- Each result includes path, physical_address, checksum, mtime, size_bytes, and content_type, providing material for metadata-level validation

This implementation focuses specifically on the import verification angle of the general-purpose listObjects API.

Usage

Use this endpoint for import verification when:

  • Confirming that all expected objects are present after an import completes
  • Counting the total number of imported objects to match against an expected count
  • Validating that object paths correctly reflect the destination prefix mapping
  • Spot-checking object metadata (size, checksum) against known source values
  • Building automated verification gates in import pipelines

Code Reference

Source Location

  • Repository: lakeFS
  • File: api/swagger.yml (lines 6030-6080)

Signature

/repositories/{repository}/refs/{ref}/objects/ls:
  parameters:
    - in: path
      name: repository
      required: true
      schema:
        type: string
    - in: path
      name: ref
      required: true
      schema:
        type: string
      description: a reference (could be either a branch or a commit ID)
    - in: query
      name: user_metadata
      required: false
      schema:
        type: boolean
        default: true
    - in: query
      name: presign
      required: false
      schema:
        type: boolean
    - $ref: "#/components/parameters/PaginationAfter"
    - $ref: "#/components/parameters/PaginationAmount"
    - $ref: "#/components/parameters/PaginationDelimiter"
    - $ref: "#/components/parameters/PaginationPrefix"
  get:
    tags:
      - objects
    operationId: listObjects
    summary: list objects under a given prefix
    responses:
      200:
        description: object listing
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/ObjectStatsList"
      401:
        $ref: "#/components/responses/Unauthorized"
      404:
        $ref: "#/components/responses/NotFound"

Import

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="access_key",
    password="secret_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")

I/O Contract

Inputs

Name Type Required Description
repository string (path) Yes Repository name
ref string (path) Yes Branch name or commit ID to list objects from
user_metadata boolean (query) No Include user metadata in results (default: true)
presign boolean (query) No If true, return pre-signed URLs for object access
after string (query) No Pagination cursor: return objects after this path (lexicographic)
amount integer (query) No Maximum number of objects to return per page (default varies by server config)
prefix string (query) No Filter results to objects with paths starting with this prefix. Key parameter for import verification.
delimiter string (query) No Group objects by this delimiter (e.g., / for directory-style listing)

Outputs

Name Type Description
pagination Pagination Pagination metadata: has_more (boolean), next_offset (string), results count, max_per_page
results []ObjectStats Array of object metadata entries

ObjectStats fields:

Name Type Description
path string Full object path within the repository
physical_address string Physical storage location (e.g., the original S3 URI for imported objects)
checksum string Object checksum (e.g., ETag)
mtime integer Last modification time (Unix timestamp)
size_bytes integer Object size in bytes
content_type string MIME type of the object
metadata object User-defined key-value metadata (if user_metadata=true)

HTTP Status Codes:

Code Description
200 Success -- returns ObjectStatsList with pagination and results
400 Bad request -- invalid parameters
401 Unauthorized -- missing or invalid credentials
404 Not found -- repository or ref does not exist
429 Too many requests -- rate limited

Usage Examples

Verify Import with Prefix Filter (curl)

# List objects under the import destination prefix to verify they exist
REPO="my-repo"
BRANCH="main"
PREFIX="collections/"

curl -s \
  "http://localhost:8000/api/v1/repositories/${REPO}/refs/${BRANCH}/objects/ls?prefix=${PREFIX}" \
  -u "access_key:secret_key" | jq '.results | length'

# Response: 128000 (number of imported objects)

Paginated Verification in Python

import requests

LAKEFS_URL = "http://localhost:8000/api/v1"
AUTH = ("access_key", "secret_key")
REPO = "my-repo"
BRANCH = "main"
IMPORT_PREFIX = "imported/new-prefix/"

# Count all objects under the import prefix
total_count = 0
has_more = True
after = ""

while has_more:
    params = {
        "prefix": IMPORT_PREFIX,
        "after": after,
        "amount": 1000,
    }
    resp = requests.get(
        f"{LAKEFS_URL}/repositories/{REPO}/refs/{BRANCH}/objects/ls",
        params=params,
        auth=AUTH,
    )
    resp.raise_for_status()
    data = resp.json()

    results = data["results"]
    total_count += len(results)

    # Verify each object has the correct prefix
    for obj in results:
        assert obj["path"].startswith(IMPORT_PREFIX), (
            f"Unexpected path: {obj['path']}"
        )

    has_more = data["pagination"]["has_more"]
    if has_more:
        after = data["pagination"]["next_offset"]

print(f"Verified {total_count} objects under {IMPORT_PREFIX}")

Verify Specific Files After Import

import requests

LAKEFS_URL = "http://localhost:8000/api/v1"
AUTH = ("access_key", "secret_key")
REPO = "my-repo"
BRANCH = "main"

# Known files that should exist after import
expected_files = [
    "imported/new-prefix/nested/prefix-1/file002005",
    "imported/new-prefix/nested/prefix-2/file001894",
    "imported/new-prefix/prefix-1/file002100",
    "imported/new-prefix/prefix-5/file000987",
]

for file_path in expected_files:
    resp = requests.head(
        f"{LAKEFS_URL}/repositories/{REPO}/refs/{BRANCH}/objects",
        params={"path": file_path},
        auth=AUTH,
    )
    if resp.status_code == 200:
        print(f"PASS: {file_path}")
    else:
        print(f"FAIL: {file_path} (status {resp.status_code})")

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment