Implementation:Lance format Lance Java WriteDatasetBuilder
| Knowledge Sources | |
|---|---|
| Domains | Java_SDK, Dataset_Management |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
The WriteDatasetBuilder class provides a fluent builder API for creating or writing Lance datasets, supporting both direct URI-based writes and namespace-managed writes with automatic credential vending.
Description
WriteDatasetBuilder is a builder class obtained via Dataset.write(). It supports two mutually exclusive destination modes:
- URI mode: Write directly to a URI (local path, S3, Azure, GCS)
- Namespace mode: Write through a
LanceNamespacewith automatic table location resolution and credential vending
The builder accepts data in one of three forms (mutually exclusive):
- ArrowReader: An Apache Arrow reader providing data batches
- ArrowArrayStream: A raw Arrow C Data Interface stream
- Schema only: For creating empty tables with a defined schema
Key configuration options include:
- Write mode: CREATE, APPEND, or OVERWRITE
- Row/file limits:
maxRowsPerFile,maxRowsPerGroup,maxBytesPerFile - Stable row IDs: Enable persistent row identifiers across versions
- Data storage version: Select the Lance file version format
- Storage options: Provider-specific configuration for cloud storage
- Buffer allocator: Custom Arrow allocator (auto-created if not provided)
When using namespace mode, the builder automatically calls declareTable() for CREATE mode or describeTable() for APPEND/OVERWRITE mode, and sets up a StorageOptionsProvider for credential refresh during long-running writes.
Usage
Use this builder whenever you need to write data to a Lance dataset. It is the recommended approach (over deprecated static Dataset.create() methods) for all dataset write operations.
Code Reference
Source Location
| Property | Value |
|---|---|
| File | java/src/main/java/org/lance/WriteDatasetBuilder.java
|
| Package | org.lance
|
| Lines | 466 |
Signature
public class WriteDatasetBuilder
Import
import org.lance.WriteDatasetBuilder;
// Typically accessed via Dataset.write()
I/O Contract
Builder Methods (Input)
| Method | Parameter Type | Description |
|---|---|---|
allocator(BufferAllocator) |
BufferAllocator |
Arrow buffer allocator (optional, auto-created if omitted) |
reader(ArrowReader) |
ArrowReader |
Data source via Arrow reader |
stream(ArrowArrayStream) |
ArrowArrayStream |
Data source via Arrow C stream |
schema(Schema) |
Schema |
Schema for empty table creation |
uri(String) |
String |
Direct dataset URI |
namespace(LanceNamespace) |
LanceNamespace |
Namespace for managed writes |
tableId(List<String>) |
List<String> |
Table identifier within namespace |
mode(WriteMode) |
WriteParams.WriteMode |
Write mode: CREATE, APPEND, or OVERWRITE |
storageOptions(Map) |
Map<String, String> |
Cloud storage configuration |
maxRowsPerFile(int) |
int |
Maximum rows per output file |
maxRowsPerGroup(int) |
int |
Maximum rows per row group |
maxBytesPerFile(long) |
long |
Maximum bytes per output file |
enableStableRowIds(boolean) |
boolean |
Enable stable row IDs |
dataStorageVersion(LanceFileVersion) |
WriteParams.LanceFileVersion |
Lance file format version |
Terminal Method (Output)
| Method | Output | Description |
|---|---|---|
execute() |
Dataset |
Executes the write and returns the created/updated dataset |
Usage Examples
Writing with an ArrowReader to a URI
import org.lance.Dataset;
import org.lance.WriteParams;
import org.apache.arrow.vector.ipc.ArrowReader;
Dataset dataset = Dataset.write()
.reader(myArrowReader)
.uri("s3://my-bucket/my-table.lance")
.mode(WriteParams.WriteMode.CREATE)
.maxRowsPerFile(100000)
.maxBytesPerFile(128 * 1024 * 1024)
.enableStableRowIds(true)
.execute();
Creating an Empty Table with Schema
import org.lance.Dataset;
import org.lance.WriteParams;
import org.apache.arrow.vector.types.pojo.Schema;
Dataset dataset = Dataset.write()
.schema(mySchema)
.uri("/path/to/empty-table.lance")
.mode(WriteParams.WriteMode.CREATE)
.execute();
Writing via a Namespace
import org.lance.Dataset;
import org.lance.WriteParams;
import org.lance.namespace.LanceNamespace;
import java.util.Arrays;
Dataset dataset = Dataset.write()
.reader(myArrowReader)
.namespace(myNamespace)
.tableId(Arrays.asList("my_table"))
.mode(WriteParams.WriteMode.CREATE)
.execute();
Appending Data to an Existing Dataset
import org.lance.Dataset;
import org.lance.WriteParams;
Dataset dataset = Dataset.write()
.stream(arrowArrayStream)
.uri("/path/to/existing-table.lance")
.mode(WriteParams.WriteMode.APPEND)
.execute();
Related Pages
- Lance_format_Lance_Java_Dataset - Main dataset class that provides the
write()factory method - Lance_format_Lance_Java_Fragment - Fragment-level write operations
- Lance_format_Lance_Java_DirectoryNamespace - Directory-based namespace for managed writes
- Lance_format_Lance_Java_RestNamespace - REST-based namespace for managed writes