Implementation:Treeverse LakeFS ListObjects For Import

Knowledge Sources	lakeFS lakeFS API Reference
Domains	Data_Import, REST_API
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete API endpoint for listing objects under a given reference, used here to verify that imported data is correctly present at expected destination paths after an import operation, provided by the lakeFS REST API.

Description

The listObjects endpoint returns a paginated list of objects (files) under a given branch or commit reference. When used for import verification, it is called with a prefix filter matching the import destination path. This allows the client to enumerate all imported objects and confirm their presence, count, metadata, and path structure.

Key capabilities for import verification:

Prefix filtering -- The prefix query parameter narrows results to only objects under the import destination, making verification efficient even when the branch contains millions of objects from prior operations
Pagination -- Results are paginated via after and amount parameters, allowing verification of arbitrarily large import sets
Delimiter support -- The delimiter parameter enables directory-style listing to verify the hierarchical structure of imported data
Object metadata -- Each result includes path, physical_address, checksum, mtime, size_bytes, and content_type, providing material for metadata-level validation

This implementation focuses specifically on the import verification angle of the general-purpose listObjects API.

Usage

Use this endpoint for import verification when:

Confirming that all expected objects are present after an import completes
Counting the total number of imported objects to match against an expected count
Validating that object paths correctly reflect the destination prefix mapping
Spot-checking object metadata (size, checksum) against known source values
Building automated verification gates in import pipelines

Code Reference

Source Location

Repository: lakeFS
File: api/swagger.yml (lines 6030-6080)

Signature

/repositories/{repository}/refs/{ref}/objects/ls:
  parameters:
    - in: path
      name: repository
      required: true
      schema:
        type: string
    - in: path
      name: ref
      required: true
      schema:
        type: string
      description: a reference (could be either a branch or a commit ID)
    - in: query
      name: user_metadata
      required: false
      schema:
        type: boolean
        default: true
    - in: query
      name: presign
      required: false
      schema:
        type: boolean
    - $ref: "#/components/parameters/PaginationAfter"
    - $ref: "#/components/parameters/PaginationAmount"
    - $ref: "#/components/parameters/PaginationDelimiter"
    - $ref: "#/components/parameters/PaginationPrefix"
  get:
    tags:
      - objects
    operationId: listObjects
    summary: list objects under a given prefix
    responses:
      200:
        description: object listing
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/ObjectStatsList"
      401:
        $ref: "#/components/responses/Unauthorized"
      404:
        $ref: "#/components/responses/NotFound"

Import

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="access_key",
    password="secret_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")

I/O Contract

Inputs

Name	Type	Required	Description
repository	string (path)	Yes	Repository name
ref	string (path)	Yes	Branch name or commit ID to list objects from
user_metadata	boolean (query)	No	Include user metadata in results (default: true)
presign	boolean (query)	No	If true, return pre-signed URLs for object access
after	string (query)	No	Pagination cursor: return objects after this path (lexicographic)
amount	integer (query)	No	Maximum number of objects to return per page (default varies by server config)
prefix	string (query)	No	Filter results to objects with paths starting with this prefix. Key parameter for import verification.
delimiter	string (query)	No	Group objects by this delimiter (e.g., `/` for directory-style listing)

Outputs

Name	Type	Description
pagination	Pagination	Pagination metadata: `has_more` (boolean), `next_offset` (string), `results` count, `max_per_page`
results	[]ObjectStats	Array of object metadata entries

ObjectStats fields:

Name	Type	Description
path	string	Full object path within the repository
physical_address	string	Physical storage location (e.g., the original S3 URI for imported objects)
checksum	string	Object checksum (e.g., ETag)
mtime	integer	Last modification time (Unix timestamp)
size_bytes	integer	Object size in bytes
content_type	string	MIME type of the object
metadata	object	User-defined key-value metadata (if `user_metadata=true`)

HTTP Status Codes:

Code	Description
200	Success -- returns ObjectStatsList with pagination and results
400	Bad request -- invalid parameters
401	Unauthorized -- missing or invalid credentials
404	Not found -- repository or ref does not exist
429	Too many requests -- rate limited

Usage Examples

Verify Import with Prefix Filter (curl)

# List objects under the import destination prefix to verify they exist
REPO="my-repo"
BRANCH="main"
PREFIX="collections/"

curl -s \
  "http://localhost:8000/api/v1/repositories/${REPO}/refs/${BRANCH}/objects/ls?prefix=${PREFIX}" \
  -u "access_key:secret_key" | jq '.results | length'

# Response: 128000 (number of imported objects)

Paginated Verification in Python

import requests

LAKEFS_URL = "http://localhost:8000/api/v1"
AUTH = ("access_key", "secret_key")
REPO = "my-repo"
BRANCH = "main"
IMPORT_PREFIX = "imported/new-prefix/"

# Count all objects under the import prefix
total_count = 0
has_more = True
after = ""

while has_more:
    params = {
        "prefix": IMPORT_PREFIX,
        "after": after,
        "amount": 1000,
    }
    resp = requests.get(
        f"{LAKEFS_URL}/repositories/{REPO}/refs/{BRANCH}/objects/ls",
        params=params,
        auth=AUTH,
    )
    resp.raise_for_status()
    data = resp.json()

    results = data["results"]
    total_count += len(results)

    # Verify each object has the correct prefix
    for obj in results:
        assert obj["path"].startswith(IMPORT_PREFIX), (
            f"Unexpected path: {obj['path']}"
        )

    has_more = data["pagination"]["has_more"]
    if has_more:
        after = data["pagination"]["next_offset"]

print(f"Verified {total_count} objects under {IMPORT_PREFIX}")

Verify Specific Files After Import

import requests

LAKEFS_URL = "http://localhost:8000/api/v1"
AUTH = ("access_key", "secret_key")
REPO = "my-repo"
BRANCH = "main"

# Known files that should exist after import
expected_files = [
    "imported/new-prefix/nested/prefix-1/file002005",
    "imported/new-prefix/nested/prefix-2/file001894",
    "imported/new-prefix/prefix-1/file002100",
    "imported/new-prefix/prefix-5/file000987",
]

for file_path in expected_files:
    resp = requests.head(
        f"{LAKEFS_URL}/repositories/{REPO}/refs/{BRANCH}/objects",
        params={"path": file_path},
        auth=AUTH,
    )
    if resp.status_code == 200:
        print(f"PASS: {file_path}")
    else:
        print(f"FAIL: {file_path} (status {resp.status_code})")

Related Pages

Implements Principle

Principle:Treeverse_LakeFS_Imported_Data_Verification

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment