Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance Java WriteDatasetBuilder

From Leeroopedia


Knowledge Sources
Domains Java_SDK, Dataset_Management
Last Updated 2026-02-08 19:33 GMT

Overview

The WriteDatasetBuilder class provides a fluent builder API for creating or writing Lance datasets, supporting both direct URI-based writes and namespace-managed writes with automatic credential vending.

Description

WriteDatasetBuilder is a builder class obtained via Dataset.write(). It supports two mutually exclusive destination modes:

  • URI mode: Write directly to a URI (local path, S3, Azure, GCS)
  • Namespace mode: Write through a LanceNamespace with automatic table location resolution and credential vending

The builder accepts data in one of three forms (mutually exclusive):

  • ArrowReader: An Apache Arrow reader providing data batches
  • ArrowArrayStream: A raw Arrow C Data Interface stream
  • Schema only: For creating empty tables with a defined schema

Key configuration options include:

  • Write mode: CREATE, APPEND, or OVERWRITE
  • Row/file limits: maxRowsPerFile, maxRowsPerGroup, maxBytesPerFile
  • Stable row IDs: Enable persistent row identifiers across versions
  • Data storage version: Select the Lance file version format
  • Storage options: Provider-specific configuration for cloud storage
  • Buffer allocator: Custom Arrow allocator (auto-created if not provided)

When using namespace mode, the builder automatically calls declareTable() for CREATE mode or describeTable() for APPEND/OVERWRITE mode, and sets up a StorageOptionsProvider for credential refresh during long-running writes.

Usage

Use this builder whenever you need to write data to a Lance dataset. It is the recommended approach (over deprecated static Dataset.create() methods) for all dataset write operations.

Code Reference

Source Location

Property Value
File java/src/main/java/org/lance/WriteDatasetBuilder.java
Package org.lance
Lines 466

Signature

public class WriteDatasetBuilder

Import

import org.lance.WriteDatasetBuilder;
// Typically accessed via Dataset.write()

I/O Contract

Builder Methods (Input)

Method Parameter Type Description
allocator(BufferAllocator) BufferAllocator Arrow buffer allocator (optional, auto-created if omitted)
reader(ArrowReader) ArrowReader Data source via Arrow reader
stream(ArrowArrayStream) ArrowArrayStream Data source via Arrow C stream
schema(Schema) Schema Schema for empty table creation
uri(String) String Direct dataset URI
namespace(LanceNamespace) LanceNamespace Namespace for managed writes
tableId(List<String>) List<String> Table identifier within namespace
mode(WriteMode) WriteParams.WriteMode Write mode: CREATE, APPEND, or OVERWRITE
storageOptions(Map) Map<String, String> Cloud storage configuration
maxRowsPerFile(int) int Maximum rows per output file
maxRowsPerGroup(int) int Maximum rows per row group
maxBytesPerFile(long) long Maximum bytes per output file
enableStableRowIds(boolean) boolean Enable stable row IDs
dataStorageVersion(LanceFileVersion) WriteParams.LanceFileVersion Lance file format version

Terminal Method (Output)

Method Output Description
execute() Dataset Executes the write and returns the created/updated dataset

Usage Examples

Writing with an ArrowReader to a URI

import org.lance.Dataset;
import org.lance.WriteParams;
import org.apache.arrow.vector.ipc.ArrowReader;

Dataset dataset = Dataset.write()
    .reader(myArrowReader)
    .uri("s3://my-bucket/my-table.lance")
    .mode(WriteParams.WriteMode.CREATE)
    .maxRowsPerFile(100000)
    .maxBytesPerFile(128 * 1024 * 1024)
    .enableStableRowIds(true)
    .execute();

Creating an Empty Table with Schema

import org.lance.Dataset;
import org.lance.WriteParams;
import org.apache.arrow.vector.types.pojo.Schema;

Dataset dataset = Dataset.write()
    .schema(mySchema)
    .uri("/path/to/empty-table.lance")
    .mode(WriteParams.WriteMode.CREATE)
    .execute();

Writing via a Namespace

import org.lance.Dataset;
import org.lance.WriteParams;
import org.lance.namespace.LanceNamespace;
import java.util.Arrays;

Dataset dataset = Dataset.write()
    .reader(myArrowReader)
    .namespace(myNamespace)
    .tableId(Arrays.asList("my_table"))
    .mode(WriteParams.WriteMode.CREATE)
    .execute();

Appending Data to an Existing Dataset

import org.lance.Dataset;
import org.lance.WriteParams;

Dataset dataset = Dataset.write()
    .stream(arrowArrayStream)
    .uri("/path/to/existing-table.lance")
    .mode(WriteParams.WriteMode.APPEND)
    .execute();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment