Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance Java OpenDatasetBuilder

From Leeroopedia


Knowledge Sources
Domains Java_SDK, Dataset_Management
Last Updated 2026-02-08 19:33 GMT

Overview

Description

OpenDatasetBuilder provides a fluent builder API for opening Lance datasets. It supports two mutually exclusive modes of dataset access: direct URI-based opening and namespace-based opening. When using a namespace, the builder automatically fetches table location and storage options via the LanceNamespace.describeTable() call. If no BufferAllocator is provided, the builder creates a self-managed RootAllocator with maximum capacity.

Usage

Instances are created via Dataset.open() (package-private constructor). The builder validates that exactly one of URI or namespace+tableId is provided, and throws IllegalArgumentException with descriptive messages for invalid configurations. The build() method returns a fully initialized Dataset instance.

Code Reference

Source Location

java/src/main/java/org/lance/OpenDatasetBuilder.java

Signature

public class OpenDatasetBuilder {
    public OpenDatasetBuilder allocator(BufferAllocator allocator);
    public OpenDatasetBuilder uri(String uri);
    public OpenDatasetBuilder namespace(LanceNamespace namespace);
    public OpenDatasetBuilder tableId(List<String> tableId);
    public OpenDatasetBuilder readOptions(ReadOptions options);
    public Dataset build();
}

Import

import org.lance.OpenDatasetBuilder;

I/O Contract

Builder Methods
Method Parameter Type Description
allocator() BufferAllocator Sets the Arrow buffer allocator; if omitted, a RootAllocator is created
uri() String Sets the dataset URI (e.g., s3://bucket/table.lance); mutually exclusive with namespace+tableId
namespace() LanceNamespace Sets the namespace for location resolution; requires tableId
tableId() List<String> Sets the table identifier; requires namespace
readOptions() ReadOptions Sets read configuration (version, cache sizes, storage options)
Return Values
Method Return Type Description
build() Dataset Opens and returns the dataset; throws IllegalArgumentException on invalid configuration
Exceptions
Exception Condition
IllegalArgumentException Both URI and namespace+tableId are specified
IllegalArgumentException Neither URI nor namespace+tableId is specified
IllegalArgumentException namespace is set without tableId, or vice versa
IllegalArgumentException Namespace describeTable returns null or empty location

Usage Examples

import org.lance.Dataset;
import org.lance.ReadOptions;
import org.apache.arrow.memory.RootAllocator;
import java.util.Arrays;

// Open a dataset by URI with default options
Dataset dataset = Dataset.open()
    .uri("s3://bucket/table.lance")
    .build();

// Open a dataset by URI with custom allocator and read options
ReadOptions options = new ReadOptions.Builder()
    .setVersion(5)
    .setIndexCacheSizeBytes(2L * 1024 * 1024 * 1024)
    .build();

Dataset dataset = Dataset.open()
    .allocator(new RootAllocator())
    .uri("file:///data/table.lance")
    .readOptions(options)
    .build();

// Open a dataset via namespace
Dataset dataset = Dataset.open()
    .namespace(myNamespace)
    .tableId(Arrays.asList("my_table"))
    .build();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment