Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Treeverse LakeFS ImportStart

From Leeroopedia


Knowledge Sources
Domains Data_Import, REST_API
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete API endpoint for initiating an asynchronous zero-copy data import from external object storage into a lakeFS branch, provided by the lakeFS REST API.

Description

The importStart endpoint accepts a JSON request body describing one or more external storage locations to import, along with commit metadata. It validates the request, enqueues the import job on the server, and immediately returns an import job identifier (HTTP 202 Accepted). The client uses this identifier to poll the importStatus endpoint for progress updates.

Key behaviors:

  • The import runs asynchronously on the server; the POST returns immediately
  • A new commit is created on the target branch upon successful completion
  • If the branch has uncommitted changes, the request is rejected unless force: true is set
  • The import processes all specified paths atomically -- either all succeed or none are committed

Usage

Use this endpoint when:

  • Starting a new import job from a client application, script, or pipeline
  • Importing data from S3, GCS, or Azure Blob Storage into a specific lakeFS branch
  • Kicking off large-scale imports that will be monitored via the importStatus endpoint

Code Reference

Source Location

  • Repository: lakeFS
  • File: api/swagger.yml (lines 5552-5610)

Signature

/repositories/{repository}/branches/{branch}/import:
  post:
    tags:
      - import
    operationId: importStart
    summary: import data from object store
    requestBody:
      required: true
      content:
        application/json:
          schema:
            $ref: "#/components/schemas/ImportCreation"
    responses:
      202:
        description: Import started
        content:
          application/json:
            schema:
              $ref: "#/components/schemas/ImportCreationResponse"
      400:
        $ref: "#/components/responses/ValidationError"
      401:
        $ref: "#/components/responses/Unauthorized"
      403:
        $ref: "#/components/responses/Forbidden"
      404:
        $ref: "#/components/responses/NotFound"
      429:
        description: too many requests

Import

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="access_key",
    password="secret_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")

I/O Contract

Inputs

Name Type Required Description
repository string (path) Yes Repository name
branch string (path) Yes Branch name to import into
paths []ImportLocation (body) Yes Array of source-to-destination mappings. Each entry has type (common_prefix or object), path (external URI), and destination (relative lakeFS path).
commit CommitCreation (body) Yes Commit metadata: message (required), metadata (optional key-value map), date (optional), allow_empty (optional)
force boolean (body) No If true, allows importing even when the branch has uncommitted changes (default: false)

Outputs

Name Type Description
id string Unique identifier for the import job. Use this to poll importStatus for progress and results.

HTTP Status Codes:

Code Description
202 Import accepted and started -- returns ImportCreationResponse with job ID
400 Validation error -- malformed request body or invalid source paths
401 Unauthorized -- missing or invalid credentials
403 Forbidden -- insufficient permissions on the repository or branch
404 Not found -- repository or branch does not exist
429 Too many requests -- rate limited

Usage Examples

Start Import with Python lakefs SDK

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="access_key",
    password="secret_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")

# Use the branch import method for zero-copy import
# This wraps the importStart API call
import_config = [
    lakefs.ImportLocation(
        type="common_prefix",
        path="s3://my-bucket/production/collections/",
        destination="collections/"
    )
]

Start Import with curl

curl -X POST \
  "http://localhost:8000/api/v1/repositories/my-repo/branches/main/import" \
  -H "Content-Type: application/json" \
  -u "access_key:secret_key" \
  -d '{
    "paths": [
      {
        "type": "common_prefix",
        "path": "s3://my-bucket/production/collections/",
        "destination": "collections/"
      }
    ],
    "commit": {
      "message": "Import production collections from S3"
    }
  }'

# Response (HTTP 202):
# {
#   "id": "c7a300b8-4a20-4e3b-a3b5-2ef4f2e7d0a1"
# }

Start Import with Force Flag

# Force import even if branch has uncommitted changes
curl -X POST \
  "http://localhost:8000/api/v1/repositories/my-repo/branches/ingestion/import" \
  -H "Content-Type: application/json" \
  -u "access_key:secret_key" \
  -d '{
    "paths": [
      {
        "type": "common_prefix",
        "path": "s3://my-bucket/raw/2024-01-15/",
        "destination": "imported/new-prefix/"
      },
      {
        "type": "object",
        "path": "s3://my-bucket/raw/manifest.json",
        "destination": "imported/manifest.json"
      }
    ],
    "commit": {
      "message": "Import daily data drop 2024-01-15",
      "metadata": {
        "created_by": "import",
        "source_date": "2024-01-15"
      }
    },
    "force": true
  }'

Related Pages

Implements Principle


Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment