Implementation:Treeverse LakeFS ImportStatus
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Data_Import, REST_API |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete API endpoint for querying the current status and progress of an asynchronous import operation, provided by the lakeFS REST API.
Description
The importStatus endpoint allows clients to poll the progress of a previously initiated import job. Given the import job ID (obtained from importStart), it returns a status object that includes:
- Whether the import has completed
- The number of objects ingested so far
- The last update timestamp for detecting stalls
- On successful completion: the resulting commit object and metarange ID
- On failure: an error object with details
This endpoint is designed to be called repeatedly in a polling loop until the completed field becomes true.
Usage
Use this endpoint when:
- Polling for completion after calling
importStart - Building progress indicators for import operations in CLIs, dashboards, or web UIs
- Implementing timeout and cancellation logic in automated import pipelines
- Retrieving the commit reference created by a successful import for subsequent operations (tagging, branching, verification)
Code Reference
Source Location
- Repository: lakeFS
- File:
api/swagger.yml(lines 5523-5551)
Signature
/repositories/{repository}/branches/{branch}/import:
get:
tags:
- import
operationId: importStatus
summary: get import status
parameters:
- in: query
name: id
description: Unique identifier of the import process
schema:
type: string
required: true
responses:
200:
description: import status
content:
application/json:
schema:
$ref: "#/components/schemas/ImportStatus"
400:
$ref: "#/components/responses/BadRequest"
401:
$ref: "#/components/responses/Unauthorized"
404:
$ref: "#/components/responses/NotFound"
429:
description: too many requests
Import
import lakefs
import time
client = lakefs.Client(
host="http://localhost:8000",
username="access_key",
password="secret_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repository | string (path) | Yes | Repository name |
| branch | string (path) | Yes | Branch name where the import is running |
| id | string (query) | Yes | Unique identifier of the import process, returned by importStart
|
Outputs
| Name | Type | Description |
|---|---|---|
| completed | boolean | Whether the import has finished (true on both success and failure) |
| update_time | date-time | Timestamp of the last status update; compare across polls to detect stalls |
| ingested_objects | int64 | Number of objects processed so far; monotonically increasing during the import |
| metarange_id | string | ID of the constructed metarange; set upon successful completion |
| commit | Commit | The commit object created by the import; set upon successful completion. Contains id, message, creation_date, metadata, etc.
|
| error | Error | Error details if the import failed; null on success |
HTTP Status Codes:
| Code | Description |
|---|---|
| 200 | Success -- returns the current ImportStatus object |
| 400 | Bad request -- invalid or missing import ID |
| 401 | Unauthorized -- missing or invalid credentials |
| 404 | Not found -- repository, branch, or import ID does not exist |
| 429 | Too many requests -- rate limited |
Usage Examples
Poll Import Status with curl
# Poll for import status using the job ID from importStart
IMPORT_ID="c7a300b8-4a20-4e3b-a3b5-2ef4f2e7d0a1"
REPO="my-repo"
BRANCH="main"
curl -s \
"http://localhost:8000/api/v1/repositories/${REPO}/branches/${BRANCH}/import?id=${IMPORT_ID}" \
-u "access_key:secret_key"
# Response (in progress):
# {
# "completed": false,
# "update_time": "2024-01-15T10:30:45Z",
# "ingested_objects": 42850
# }
# Response (completed):
# {
# "completed": true,
# "update_time": "2024-01-15T10:35:12Z",
# "ingested_objects": 128000,
# "metarange_id": "480e19972a6fbe98ab8e81ae5efdfd1a29037587e91244e87abd4adefffdb01c",
# "commit": {
# "id": "a1b2c3d4e5f6...",
# "message": "Import production collections from S3",
# "creation_date": 1705312512
# }
# }
Polling Loop in Python
import requests
import time
LAKEFS_URL = "http://localhost:8000/api/v1"
AUTH = ("access_key", "secret_key")
REPO = "my-repo"
BRANCH = "main"
IMPORT_ID = "c7a300b8-4a20-4e3b-a3b5-2ef4f2e7d0a1"
polling_interval = 2 # seconds
previous_update_time = None
while True:
time.sleep(polling_interval)
resp = requests.get(
f"{LAKEFS_URL}/repositories/{REPO}/branches/{BRANCH}/import",
params={"id": IMPORT_ID},
auth=AUTH,
)
resp.raise_for_status()
status = resp.json()
# Check for errors
if status.get("error"):
raise RuntimeError(f"Import failed: {status['error']}")
# Detect stalls
current_update_time = status["update_time"]
if current_update_time == previous_update_time:
print("WARNING: Import may be stalled")
previous_update_time = current_update_time
# Log progress
ingested = status.get("ingested_objects", 0)
print(f"Import progress: {ingested} objects ingested")
# Check completion
if status["completed"]:
commit = status["commit"]
print(f"Import completed. Commit ID: {commit['id']}")
break
Polling Loop in Bash
#!/bin/bash
IMPORT_ID="c7a300b8-4a20-4e3b-a3b5-2ef4f2e7d0a1"
REPO="my-repo"
BRANCH="main"
while true; do
sleep 2
STATUS=$(curl -s \
"http://localhost:8000/api/v1/repositories/${REPO}/branches/${BRANCH}/import?id=${IMPORT_ID}" \
-u "access_key:secret_key")
COMPLETED=$(echo "$STATUS" | jq -r '.completed')
INGESTED=$(echo "$STATUS" | jq -r '.ingested_objects // 0')
echo "Progress: ${INGESTED} objects ingested"
if [ "$COMPLETED" = "true" ]; then
COMMIT_ID=$(echo "$STATUS" | jq -r '.commit.id')
echo "Import completed. Commit: ${COMMIT_ID}"
break
fi
done
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment