Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance Java ScanOptions

From Leeroopedia
Revision as of 15:28, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Lance_format_Lance_Java_ScanOptions.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Java_SDK, Dataset_Management
Last Updated 2026-02-08 19:33 GMT

Overview

The ScanOptions class encapsulates all configurable parameters for scanning a Lance dataset, including column projection, filtering, pagination, nearest-neighbor search, full-text search, and batch readahead.

Description

ScanOptions is an immutable configuration object constructed via its nested Builder class. It is passed to Dataset.newScan(ScanOptions) or Fragment.newScan(ScanOptions) to configure how data is read from the dataset. All parameters are optional and use Optional wrappers.

Supported configuration options:

  • fragmentIds: Restrict the scan to specific fragment IDs
  • batchSize: Maximum number of rows per returned ArrowRecordBatch
  • columns: Column projection list (scan all columns if omitted)
  • filter: SQL filter expression string
  • substraitFilter: Substrait binary filter expression (mutually exclusive with filter)
  • limit: Maximum total rows to return
  • offset: Number of rows to skip before returning results
  • nearest: Nearest-neighbor vector search query (Query object)
  • fullTextQuery: Full-text search query (FullTextQuery object)
  • withRowId: Include the row ID pseudo-column in results
  • withRowAddress: Include the row address pseudo-column in results
  • batchReadahead: Number of batches to prefetch (default: 16)
  • columnOrderings: Column ordering specifications for sorted output

The Builder supports creating options from scratch or copying from an existing ScanOptions instance.

Usage

Use ScanOptions whenever you need to configure a dataset or fragment scan beyond the defaults. This includes column projection, row filtering, pagination, vector similarity search, and full-text search.

Code Reference

Source Location

Property Value
File java/src/main/java/org/lance/ipc/ScanOptions.java
Package org.lance.ipc
Lines 419

Signature

public class ScanOptions

Import

import org.lance.ipc.ScanOptions;

I/O Contract

Builder Methods (Input)

Method Parameter Type Default Description
fragmentIds(List<Integer>) List<Integer> empty Restrict scan to specific fragments
batchSize(long) long empty Max rows per record batch
columns(List<String>) List<String> empty (all columns) Column projection
filter(String) String empty SQL filter expression
substraitFilter(ByteBuffer) ByteBuffer empty Substrait filter (mutually exclusive with filter)
limit(long) long empty Max total rows
offset(long) long empty Rows to skip
nearest(Query) Query empty Nearest-neighbor query
fullTextQuery(FullTextQuery) FullTextQuery empty Full-text search query
withRowId(boolean) boolean false Include row ID
withRowAddress(boolean) boolean false Include row address
batchReadahead(int) int 16 Prefetch batch count
setColumnOrderings(List<ColumnOrdering>) List<ColumnOrdering> empty Sort order

Getter Methods (Output)

Method Return Type Description
getFragmentIds() Optional<List<Integer>> Fragment IDs to scan
getBatchSize() Optional<Long> Batch size setting
getColumns() Optional<List<String>> Projected columns
getFilter() Optional<String> SQL filter expression
getSubstraitFilter() Optional<ByteBuffer> Substrait filter
getLimit() Optional<Long> Row limit
getOffset() Optional<Long> Row offset
getNearest() Optional<Query> Nearest-neighbor query
getFullTextQuery() Optional<FullTextQuery> Full-text query
isWithRowId() boolean Whether row IDs are included
isWithRowAddress() boolean Whether row addresses are included
getBatchReadahead() int Batch readahead count
getColumnOrderings() Optional<List<ColumnOrdering>> Column orderings

Usage Examples

Basic Scan with Column Projection and Filter

import org.lance.ipc.ScanOptions;
import java.util.Arrays;

ScanOptions options = new ScanOptions.Builder()
    .columns(Arrays.asList("id", "name", "embedding"))
    .filter("category = 'science'")
    .batchSize(1024)
    .build();

Paginated Scan

import org.lance.ipc.ScanOptions;

ScanOptions options = new ScanOptions.Builder()
    .offset(100)
    .limit(50)
    .build();

Nearest-Neighbor Vector Search

import org.lance.ipc.ScanOptions;
import org.lance.ipc.Query;

ScanOptions options = new ScanOptions.Builder()
    .nearest(myVectorQuery)
    .columns(Arrays.asList("id", "text"))
    .limit(10)
    .build();

Full-Text Search with Row IDs

import org.lance.ipc.ScanOptions;
import org.lance.ipc.FullTextQuery;

ScanOptions options = new ScanOptions.Builder()
    .fullTextQuery(FullTextQuery.match("neural network", "abstract"))
    .withRowId(true)
    .limit(100)
    .build();

Copying and Modifying Existing Options

import org.lance.ipc.ScanOptions;

// Create new options based on existing ones but with a different filter
ScanOptions modified = new ScanOptions.Builder(existingOptions)
    .filter("status = 'active'")
    .build();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment