Implementation:Treeverse LakeFS ImportLocation Configuration
| Knowledge Sources | |
|---|---|
| Domains | Data_Import, REST_API |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete configuration pattern for defining external object storage locations and their destination mappings when preparing a zero-copy import into lakeFS.
Description
This is a pattern document that describes the user-defined configuration structures used to prepare an import operation in lakeFS. These are not standalone API endpoints but rather the request body schemas consumed by the importStart API. The two key schemas are:
- ImportLocation -- Defines a single source-to-destination mapping for one external storage path
- ImportCreation -- Wraps one or more
ImportLocationentries together with commit metadata and optional flags to form a complete import request
The ImportLocation schema supports two source types:
common_prefix-- Imports all objects sharing a common storage path prefix (equivalent to importing a "directory")object-- Imports a single specific object from external storage
The destination field determines where imported objects appear within the lakeFS branch namespace. For common_prefix imports, it acts as a prefix replacement; for object imports, it specifies the exact target path.
Usage
Use this configuration pattern when:
- Constructing the request body for the
importStartAPI call - Defining single-prefix imports (one
common_prefixentry) or multi-path imports (multiple entries mixingcommon_prefixandobjecttypes) - Specifying commit metadata (message, author, custom metadata) that will be recorded with the import commit
Code Reference
Source Location
- Repository: lakeFS
- Schema definition:
api/swagger.yml(lines 1687-1734) - Test usage:
esti/import_test.go(lines 51-139)
Signature
ImportLocation:
type: object
required:
- type
- path
- destination
properties:
type:
type: string
enum: [common_prefix, object]
description: Path type, can either be 'common_prefix' or 'object'
path:
type: string
description: >
A source location to a 'common_prefix' or to a single object.
Must match the lakeFS installation blockstore type.
example: s3://my-bucket/production/collections/
destination:
type: string
description: >
Destination for the imported objects on the branch.
Must be a relative path to the branch.
example: collections/
ImportCreation:
type: object
required:
- paths
- commit
properties:
paths:
type: array
items:
$ref: "#/components/schemas/ImportLocation"
commit:
$ref: "#/components/schemas/CommitCreation"
force:
type: boolean
default: false
Import
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="access_key",
password="secret_key"
)
I/O Contract
Inputs (ImportLocation)
| Name | Type | Required | Description |
|---|---|---|---|
| type | string (enum) | Yes | Source type: common_prefix to import a directory tree, or object to import a single file
|
| path | string | Yes | External storage URI (e.g., s3://bucket/prefix/, gs://bucket/path/, https://account.blob.core.windows.net/container/path/). Must match the lakeFS blockstore type.
|
| destination | string | Yes | Target path within the lakeFS branch. Relative path; acts as prefix for common_prefix or exact path for object.
|
Inputs (ImportCreation)
| Name | Type | Required | Description |
|---|---|---|---|
| paths | []ImportLocation | Yes | Array of source-to-destination mappings |
| commit | CommitCreation | Yes | Commit metadata: message (required), metadata (optional key-value pairs), author (optional) |
| force | boolean | No | If true, allows importing to a branch that has uncommitted changes (default: false) |
Outputs
This is a configuration pattern (request body), not a standalone endpoint. It produces no direct output; it is consumed by the importStart API which returns an ImportCreationResponse.
| Name | Type | Description |
|---|---|---|
| (consumed by importStart) | ImportCreationResponse | Contains the id of the started import job
|
Usage Examples
Single Prefix Import
import lakefs
client = lakefs.Client(
host="http://localhost:8000",
username="access_key",
password="secret_key"
)
repo = lakefs.Repository("my-repo", client=client)
branch = repo.branch("main")
# Define a single common_prefix import location
import_config = {
"paths": [
{
"type": "common_prefix",
"path": "s3://my-bucket/production/collections/",
"destination": "collections/"
}
],
"commit": {
"message": "Import production collections from S3"
}
}
Multi-Path Import with Mixed Types
# Define multiple import locations mixing prefixes and individual objects
import_config = {
"paths": [
{
"type": "common_prefix",
"path": "s3://my-bucket/raw/2024/prefix-1/",
"destination": "imported/prefix-1/"
},
{
"type": "common_prefix",
"path": "s3://my-bucket/raw/2024/prefix-2/",
"destination": "imported/prefix-2/"
},
{
"type": "object",
"path": "s3://my-bucket/raw/2024/manifest.json",
"destination": "imported/manifest.json"
}
],
"commit": {
"message": "Import 2024 data partitions and manifest",
"metadata": {
"created_by": "import",
"source": "s3://my-bucket/raw/2024/"
}
},
"force": False
}
Equivalent curl Request
curl -X POST "http://localhost:8000/api/v1/repositories/my-repo/branches/main/import" \
-H "Content-Type: application/json" \
-u "access_key:secret_key" \
-d '{
"paths": [
{
"type": "common_prefix",
"path": "s3://my-bucket/production/collections/",
"destination": "collections/"
}
],
"commit": {
"message": "Import production collections from S3"
}
}'