Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Treeverse LakeFS ImportLocation Configuration

From Leeroopedia
Revision as of 16:57, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Treeverse_LakeFS_ImportLocation_Configuration.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data_Import, REST_API
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete configuration pattern for defining external object storage locations and their destination mappings when preparing a zero-copy import into lakeFS.

Description

This is a pattern document that describes the user-defined configuration structures used to prepare an import operation in lakeFS. These are not standalone API endpoints but rather the request body schemas consumed by the importStart API. The two key schemas are:

  • ImportLocation -- Defines a single source-to-destination mapping for one external storage path
  • ImportCreation -- Wraps one or more ImportLocation entries together with commit metadata and optional flags to form a complete import request

The ImportLocation schema supports two source types:

  • common_prefix -- Imports all objects sharing a common storage path prefix (equivalent to importing a "directory")
  • object -- Imports a single specific object from external storage

The destination field determines where imported objects appear within the lakeFS branch namespace. For common_prefix imports, it acts as a prefix replacement; for object imports, it specifies the exact target path.

Usage

Use this configuration pattern when:

  • Constructing the request body for the importStart API call
  • Defining single-prefix imports (one common_prefix entry) or multi-path imports (multiple entries mixing common_prefix and object types)
  • Specifying commit metadata (message, author, custom metadata) that will be recorded with the import commit

Code Reference

Source Location

  • Repository: lakeFS
  • Schema definition: api/swagger.yml (lines 1687-1734)
  • Test usage: esti/import_test.go (lines 51-139)

Signature

ImportLocation:
  type: object
  required:
    - type
    - path
    - destination
  properties:
    type:
      type: string
      enum: [common_prefix, object]
      description: Path type, can either be 'common_prefix' or 'object'
    path:
      type: string
      description: >
        A source location to a 'common_prefix' or to a single object.
        Must match the lakeFS installation blockstore type.
      example: s3://my-bucket/production/collections/
    destination:
      type: string
      description: >
        Destination for the imported objects on the branch.
        Must be a relative path to the branch.
      example: collections/

ImportCreation:
  type: object
  required:
    - paths
    - commit
  properties:
    paths:
      type: array
      items:
        $ref: "#/components/schemas/ImportLocation"
    commit:
      $ref: "#/components/schemas/CommitCreation"
    force:
      type: boolean
      default: false

Import

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="access_key",
    password="secret_key"
)

I/O Contract

Inputs (ImportLocation)

Name Type Required Description
type string (enum) Yes Source type: common_prefix to import a directory tree, or object to import a single file
path string Yes External storage URI (e.g., s3://bucket/prefix/, gs://bucket/path/, https://account.blob.core.windows.net/container/path/). Must match the lakeFS blockstore type.
destination string Yes Target path within the lakeFS branch. Relative path; acts as prefix for common_prefix or exact path for object.

Inputs (ImportCreation)

Name Type Required Description
paths []ImportLocation Yes Array of source-to-destination mappings
commit CommitCreation Yes Commit metadata: message (required), metadata (optional key-value pairs), author (optional)
force boolean No If true, allows importing to a branch that has uncommitted changes (default: false)

Outputs

This is a configuration pattern (request body), not a standalone endpoint. It produces no direct output; it is consumed by the importStart API which returns an ImportCreationResponse.

Name Type Description
(consumed by importStart) ImportCreationResponse Contains the id of the started import job

Usage Examples

Single Prefix Import

import lakefs

client = lakefs.Client(
    host="http://localhost:8000",
    username="access_key",
    password="secret_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")

# Define a single common_prefix import location
import_config = {
    "paths": [
        {
            "type": "common_prefix",
            "path": "s3://my-bucket/production/collections/",
            "destination": "collections/"
        }
    ],
    "commit": {
        "message": "Import production collections from S3"
    }
}

Multi-Path Import with Mixed Types

# Define multiple import locations mixing prefixes and individual objects
import_config = {
    "paths": [
        {
            "type": "common_prefix",
            "path": "s3://my-bucket/raw/2024/prefix-1/",
            "destination": "imported/prefix-1/"
        },
        {
            "type": "common_prefix",
            "path": "s3://my-bucket/raw/2024/prefix-2/",
            "destination": "imported/prefix-2/"
        },
        {
            "type": "object",
            "path": "s3://my-bucket/raw/2024/manifest.json",
            "destination": "imported/manifest.json"
        }
    ],
    "commit": {
        "message": "Import 2024 data partitions and manifest",
        "metadata": {
            "created_by": "import",
            "source": "s3://my-bucket/raw/2024/"
        }
    },
    "force": False
}

Equivalent curl Request

curl -X POST "http://localhost:8000/api/v1/repositories/my-repo/branches/main/import" \
  -H "Content-Type: application/json" \
  -u "access_key:secret_key" \
  -d '{
    "paths": [
      {
        "type": "common_prefix",
        "path": "s3://my-bucket/production/collections/",
        "destination": "collections/"
      }
    ],
    "commit": {
      "message": "Import production collections from S3"
    }
  }'

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment