Implementation:Treeverse LakeFS ListObjects For Import
| Knowledge Sources | |
|---|---|
| Domains | Data_Import, REST_API |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete API endpoint for listing objects under a given reference, used here to verify that imported data is correctly present at expected destination paths after an import operation, provided by the lakeFS REST API.
Description
The listObjects endpoint returns a paginated list of objects (files) under a given branch or commit reference. When used for import verification, it is called with a prefix filter matching the import destination path. This allows the client to enumerate all imported objects and confirm their presence, count, metadata, and path structure.
Key capabilities for import verification:
- Prefix filtering -- The
prefixquery parameter narrows results to only objects under the import destination, making verification efficient even when the branch contains millions of objects from prior operations - Pagination -- Results are paginated via
afterandamountparameters, allowing verification of arbitrarily large import sets - Delimiter support -- The
delimiterparameter enables directory-style listing to verify the hierarchical structure of imported data - Object metadata -- Each result includes
path,physical_address,checksum,mtime,size_bytes, andcontent_type, providing material for metadata-level validation
This implementation focuses specifically on the import verification angle of the general-purpose listObjects API.
Usage
Use this endpoint for import verification when:
- Confirming that all expected objects are present after an import completes
- Counting the total number of imported objects to match against an expected count
- Validating that object paths correctly reflect the destination prefix mapping
- Spot-checking object metadata (size, checksum) against known source values
- Building automated verification gates in import pipelines
Code Reference
Source Location
- Repository: lakeFS
- File:
api/swagger.yml(lines 6030-6080)
Signature
/repositories/{repository}/refs/{ref}/objects/ls:
parameters:
- in: path
name: repository
required: true
schema:
type: string
- in: path
name: ref
required: true
schema:
type: string
description: a reference (could be either a branch or a commit ID)
- in: query
name: user_metadata
required: false
schema:
type: boolean
default: true
- in: query
name: presign
required: false
schema:
type: boolean
- $ref: "#/components/parameters/PaginationAfter"
- $ref: "#/components/parameters/PaginationAmount"
- $ref: "#/components/parameters/PaginationDelimiter"
- $ref: "#/components/parameters/PaginationPrefix"
get:
tags:
- objects
operationId: listObjects
summary: list objects under a given prefix
responses:
200:
description: object listing
content:
application/json:
schema:
$ref: "#/components/schemas/ObjectStatsList"
401:
$ref: "#/components/responses/Unauthorized"
404:
$ref: "#/components/responses/NotFound"
Import
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="access_key",
password="secret_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repository | string (path) | Yes | Repository name |
| ref | string (path) | Yes | Branch name or commit ID to list objects from |
| user_metadata | boolean (query) | No | Include user metadata in results (default: true) |
| presign | boolean (query) | No | If true, return pre-signed URLs for object access |
| after | string (query) | No | Pagination cursor: return objects after this path (lexicographic) |
| amount | integer (query) | No | Maximum number of objects to return per page (default varies by server config) |
| prefix | string (query) | No | Filter results to objects with paths starting with this prefix. Key parameter for import verification. |
| delimiter | string (query) | No | Group objects by this delimiter (e.g., / for directory-style listing)
|
Outputs
| Name | Type | Description |
|---|---|---|
| pagination | Pagination | Pagination metadata: has_more (boolean), next_offset (string), results count, max_per_page
|
| results | []ObjectStats | Array of object metadata entries |
ObjectStats fields:
| Name | Type | Description |
|---|---|---|
| path | string | Full object path within the repository |
| physical_address | string | Physical storage location (e.g., the original S3 URI for imported objects) |
| checksum | string | Object checksum (e.g., ETag) |
| mtime | integer | Last modification time (Unix timestamp) |
| size_bytes | integer | Object size in bytes |
| content_type | string | MIME type of the object |
| metadata | object | User-defined key-value metadata (if user_metadata=true)
|
HTTP Status Codes:
| Code | Description |
|---|---|
| 200 | Success -- returns ObjectStatsList with pagination and results |
| 400 | Bad request -- invalid parameters |
| 401 | Unauthorized -- missing or invalid credentials |
| 404 | Not found -- repository or ref does not exist |
| 429 | Too many requests -- rate limited |
Usage Examples
Verify Import with Prefix Filter (curl)
# List objects under the import destination prefix to verify they exist
REPO="my-repo"
BRANCH="main"
PREFIX="collections/"
curl -s \
"http://localhost:8000/api/v1/repositories/${REPO}/refs/${BRANCH}/objects/ls?prefix=${PREFIX}" \
-u "access_key:secret_key" | jq '.results | length'
# Response: 128000 (number of imported objects)
Paginated Verification in Python
import requests
LAKEFS_URL = "http://localhost:8000/api/v1"
AUTH = ("access_key", "secret_key")
REPO = "my-repo"
BRANCH = "main"
IMPORT_PREFIX = "imported/new-prefix/"
# Count all objects under the import prefix
total_count = 0
has_more = True
after = ""
while has_more:
params = {
"prefix": IMPORT_PREFIX,
"after": after,
"amount": 1000,
}
resp = requests.get(
f"{LAKEFS_URL}/repositories/{REPO}/refs/{BRANCH}/objects/ls",
params=params,
auth=AUTH,
)
resp.raise_for_status()
data = resp.json()
results = data["results"]
total_count += len(results)
# Verify each object has the correct prefix
for obj in results:
assert obj["path"].startswith(IMPORT_PREFIX), (
f"Unexpected path: {obj['path']}"
)
has_more = data["pagination"]["has_more"]
if has_more:
after = data["pagination"]["next_offset"]
print(f"Verified {total_count} objects under {IMPORT_PREFIX}")
Verify Specific Files After Import
import requests
LAKEFS_URL = "http://localhost:8000/api/v1"
AUTH = ("access_key", "secret_key")
REPO = "my-repo"
BRANCH = "main"
# Known files that should exist after import
expected_files = [
"imported/new-prefix/nested/prefix-1/file002005",
"imported/new-prefix/nested/prefix-2/file001894",
"imported/new-prefix/prefix-1/file002100",
"imported/new-prefix/prefix-5/file000987",
]
for file_path in expected_files:
resp = requests.head(
f"{LAKEFS_URL}/repositories/{REPO}/refs/{BRANCH}/objects",
params={"path": file_path},
auth=AUTH,
)
if resp.status_code == 200:
print(f"PASS: {file_path}")
else:
print(f"FAIL: {file_path} (status {resp.status_code})")