Implementation:Treeverse LakeFS ImportStart
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Data_Import, REST_API |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete API endpoint for initiating an asynchronous zero-copy data import from external object storage into a lakeFS branch, provided by the lakeFS REST API.
Description
The importStart endpoint accepts a JSON request body describing one or more external storage locations to import, along with commit metadata. It validates the request, enqueues the import job on the server, and immediately returns an import job identifier (HTTP 202 Accepted). The client uses this identifier to poll the importStatus endpoint for progress updates.
Key behaviors:
- The import runs asynchronously on the server; the POST returns immediately
- A new commit is created on the target branch upon successful completion
- If the branch has uncommitted changes, the request is rejected unless
force: trueis set - The import processes all specified paths atomically -- either all succeed or none are committed
Usage
Use this endpoint when:
- Starting a new import job from a client application, script, or pipeline
- Importing data from S3, GCS, or Azure Blob Storage into a specific lakeFS branch
- Kicking off large-scale imports that will be monitored via the
importStatusendpoint
Code Reference
Source Location
- Repository: lakeFS
- File:
api/swagger.yml(lines 5552-5610)
Signature
/repositories/{repository}/branches/{branch}/import:
post:
tags:
- import
operationId: importStart
summary: import data from object store
requestBody:
required: true
content:
application/json:
schema:
$ref: "#/components/schemas/ImportCreation"
responses:
202:
description: Import started
content:
application/json:
schema:
$ref: "#/components/schemas/ImportCreationResponse"
400:
$ref: "#/components/responses/ValidationError"
401:
$ref: "#/components/responses/Unauthorized"
403:
$ref: "#/components/responses/Forbidden"
404:
$ref: "#/components/responses/NotFound"
429:
description: too many requests
Import
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="access_key",
password="secret_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| repository | string (path) | Yes | Repository name |
| branch | string (path) | Yes | Branch name to import into |
| paths | []ImportLocation (body) | Yes | Array of source-to-destination mappings. Each entry has type (common_prefix or object), path (external URI), and destination (relative lakeFS path).
|
| commit | CommitCreation (body) | Yes | Commit metadata: message (required), metadata (optional key-value map), date (optional), allow_empty (optional)
|
| force | boolean (body) | No | If true, allows importing even when the branch has uncommitted changes (default: false) |
Outputs
| Name | Type | Description |
|---|---|---|
| id | string | Unique identifier for the import job. Use this to poll importStatus for progress and results.
|
HTTP Status Codes:
| Code | Description |
|---|---|
| 202 | Import accepted and started -- returns ImportCreationResponse with job ID |
| 400 | Validation error -- malformed request body or invalid source paths |
| 401 | Unauthorized -- missing or invalid credentials |
| 403 | Forbidden -- insufficient permissions on the repository or branch |
| 404 | Not found -- repository or branch does not exist |
| 429 | Too many requests -- rate limited |
Usage Examples
Start Import with Python lakefs SDK
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="access_key",
password="secret_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")
# Use the branch import method for zero-copy import
# This wraps the importStart API call
import_config = [
lakefs.ImportLocation(
type="common_prefix",
path="s3://my-bucket/production/collections/",
destination="collections/"
)
]
Start Import with curl
curl -X POST \
"http://localhost:8000/api/v1/repositories/my-repo/branches/main/import" \
-H "Content-Type: application/json" \
-u "access_key:secret_key" \
-d '{
"paths": [
{
"type": "common_prefix",
"path": "s3://my-bucket/production/collections/",
"destination": "collections/"
}
],
"commit": {
"message": "Import production collections from S3"
}
}'
# Response (HTTP 202):
# {
# "id": "c7a300b8-4a20-4e3b-a3b5-2ef4f2e7d0a1"
# }
Start Import with Force Flag
# Force import even if branch has uncommitted changes
curl -X POST \
"http://localhost:8000/api/v1/repositories/my-repo/branches/ingestion/import" \
-H "Content-Type: application/json" \
-u "access_key:secret_key" \
-d '{
"paths": [
{
"type": "common_prefix",
"path": "s3://my-bucket/raw/2024-01-15/",
"destination": "imported/new-prefix/"
},
{
"type": "object",
"path": "s3://my-bucket/raw/manifest.json",
"destination": "imported/manifest.json"
}
],
"commit": {
"message": "Import daily data drop 2024-01-15",
"metadata": {
"created_by": "import",
"source_date": "2024-01-15"
}
},
"force": true
}'
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment