Implementation:Lance format Lance Java ScanOptions
| Knowledge Sources | |
|---|---|
| Domains | Java_SDK, Dataset_Management |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
The ScanOptions class encapsulates all configurable parameters for scanning a Lance dataset, including column projection, filtering, pagination, nearest-neighbor search, full-text search, and batch readahead.
Description
ScanOptions is an immutable configuration object constructed via its nested Builder class. It is passed to Dataset.newScan(ScanOptions) or Fragment.newScan(ScanOptions) to configure how data is read from the dataset. All parameters are optional and use Optional wrappers.
Supported configuration options:
- fragmentIds: Restrict the scan to specific fragment IDs
- batchSize: Maximum number of rows per returned
ArrowRecordBatch - columns: Column projection list (scan all columns if omitted)
- filter: SQL filter expression string
- substraitFilter: Substrait binary filter expression (mutually exclusive with
filter) - limit: Maximum total rows to return
- offset: Number of rows to skip before returning results
- nearest: Nearest-neighbor vector search query (
Queryobject) - fullTextQuery: Full-text search query (
FullTextQueryobject) - withRowId: Include the row ID pseudo-column in results
- withRowAddress: Include the row address pseudo-column in results
- batchReadahead: Number of batches to prefetch (default: 16)
- columnOrderings: Column ordering specifications for sorted output
The Builder supports creating options from scratch or copying from an existing ScanOptions instance.
Usage
Use ScanOptions whenever you need to configure a dataset or fragment scan beyond the defaults. This includes column projection, row filtering, pagination, vector similarity search, and full-text search.
Code Reference
Source Location
| Property | Value |
|---|---|
| File | java/src/main/java/org/lance/ipc/ScanOptions.java
|
| Package | org.lance.ipc
|
| Lines | 419 |
Signature
public class ScanOptions
Import
import org.lance.ipc.ScanOptions;
I/O Contract
Builder Methods (Input)
| Method | Parameter Type | Default | Description |
|---|---|---|---|
fragmentIds(List<Integer>) |
List<Integer> |
empty | Restrict scan to specific fragments |
batchSize(long) |
long |
empty | Max rows per record batch |
columns(List<String>) |
List<String> |
empty (all columns) | Column projection |
filter(String) |
String |
empty | SQL filter expression |
substraitFilter(ByteBuffer) |
ByteBuffer |
empty | Substrait filter (mutually exclusive with filter) |
limit(long) |
long |
empty | Max total rows |
offset(long) |
long |
empty | Rows to skip |
nearest(Query) |
Query |
empty | Nearest-neighbor query |
fullTextQuery(FullTextQuery) |
FullTextQuery |
empty | Full-text search query |
withRowId(boolean) |
boolean |
false | Include row ID |
withRowAddress(boolean) |
boolean |
false | Include row address |
batchReadahead(int) |
int |
16 | Prefetch batch count |
setColumnOrderings(List<ColumnOrdering>) |
List<ColumnOrdering> |
empty | Sort order |
Getter Methods (Output)
| Method | Return Type | Description |
|---|---|---|
getFragmentIds() |
Optional<List<Integer>> |
Fragment IDs to scan |
getBatchSize() |
Optional<Long> |
Batch size setting |
getColumns() |
Optional<List<String>> |
Projected columns |
getFilter() |
Optional<String> |
SQL filter expression |
getSubstraitFilter() |
Optional<ByteBuffer> |
Substrait filter |
getLimit() |
Optional<Long> |
Row limit |
getOffset() |
Optional<Long> |
Row offset |
getNearest() |
Optional<Query> |
Nearest-neighbor query |
getFullTextQuery() |
Optional<FullTextQuery> |
Full-text query |
isWithRowId() |
boolean |
Whether row IDs are included |
isWithRowAddress() |
boolean |
Whether row addresses are included |
getBatchReadahead() |
int |
Batch readahead count |
getColumnOrderings() |
Optional<List<ColumnOrdering>> |
Column orderings |
Usage Examples
Basic Scan with Column Projection and Filter
import org.lance.ipc.ScanOptions;
import java.util.Arrays;
ScanOptions options = new ScanOptions.Builder()
.columns(Arrays.asList("id", "name", "embedding"))
.filter("category = 'science'")
.batchSize(1024)
.build();
Paginated Scan
import org.lance.ipc.ScanOptions;
ScanOptions options = new ScanOptions.Builder()
.offset(100)
.limit(50)
.build();
Nearest-Neighbor Vector Search
import org.lance.ipc.ScanOptions;
import org.lance.ipc.Query;
ScanOptions options = new ScanOptions.Builder()
.nearest(myVectorQuery)
.columns(Arrays.asList("id", "text"))
.limit(10)
.build();
Full-Text Search with Row IDs
import org.lance.ipc.ScanOptions;
import org.lance.ipc.FullTextQuery;
ScanOptions options = new ScanOptions.Builder()
.fullTextQuery(FullTextQuery.match("neural network", "abstract"))
.withRowId(true)
.limit(100)
.build();
Copying and Modifying Existing Options
import org.lance.ipc.ScanOptions;
// Create new options based on existing ones but with a different filter
ScanOptions modified = new ScanOptions.Builder(existingOptions)
.filter("status = 'active'")
.build();
Related Pages
- Lance_format_Lance_Java_Dataset - Dataset class that accepts ScanOptions
- Lance_format_Lance_Java_Fragment - Fragment class that accepts ScanOptions
- Lance_format_Lance_Java_FullTextQuery - Full-text query types used in ScanOptions