Implementation:Lance format Lance JNI BlockingScanner
| Knowledge Sources | |
|---|---|
| Domains | Java_Bindings, JNI |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
JNI BlockingScanner is the Rust-side JNI binding that wraps the Lance Scanner, providing synchronous data scanning with support for column projection, filtering, vector search, full-text search, and batch streaming to Java.
Description
The BlockingScanner struct wraps an Arc<Scanner> and provides blocking access to the Lance dataset scanning pipeline. It supports:
- Stream opening via
open_stream, which blocks on the asynctry_into_streamto produce aDatasetRecordBatchStream. - Schema retrieval to get the output schema of the scan.
- Row counting to count rows matching the scan criteria.
The JNI entry point Java_org_lance_ipc_LanceScanner_createScanner constructs a scanner from a BlockingDataset with extensive configuration:
- Column projection via an optional list of column names.
- Filtering via SQL-style filter strings or Substrait filter expressions.
- Batch size, limit, and offset for pagination control.
- Vector search via a
Queryobject specifying column, key vector, k, nprobes, ef, refine factor, and distance type. - Full-text search via a structured
FullTextQuerysupporting Match, MatchPhrase, MultiMatch, Boost, and Boolean query types. - Row ID and row address inclusion flags.
- Column orderings for sorted scans.
- Fragment filtering to scan only specific fragments.
The full-text search query builder (build_full_text_search_query) recursively constructs an FtsQuery tree from Java objects, supporting nested boolean and boost compositions.
Usage
Use this module when implementing or extending the Java dataset scanning API. It is invoked from the Java LanceScanner class to create configured scanners and stream Arrow record batches back to Java via FFI.
Code Reference
Source Location
java/lance-jni/src/blocking_scanner.rs
Signature
pub struct BlockingScanner {
pub(crate) inner: Arc<Scanner>,
}
impl BlockingScanner {
pub fn create(scanner: Scanner) -> Self;
pub fn open_stream(&self) -> Result<DatasetRecordBatchStream>;
pub fn schema(&self) -> Result<SchemaRef>;
pub fn count_rows(&self) -> Result<u64>;
}
Import
use crate::blocking_scanner::{BlockingScanner, NATIVE_SCANNER};
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | JObject (Java Dataset) |
The dataset to scan, carrying a native BlockingDataset handle
|
| Input | JObject (Optional columns) |
Java Optional<List<String>> for column projection
|
| Input | JObject (Optional filter) |
Java Optional<String> SQL filter expression
|
| Input | JObject (Optional Query) |
Java Optional<Query> for vector nearest-neighbor search
|
| Input | JObject (Optional FullTextQuery) |
Java Optional<FullTextQuery> for full-text search
|
| Input | jboolean (with_row_id) |
Whether to include row IDs in results |
| Input | jint (batch_readahead) |
Number of batches to read ahead |
| Output | JObject (Java LanceScanner) |
Java scanner object with native handle attached |
| Output | Arrow FFI stream | Record batch stream exported via FFI_ArrowArrayStream
|
Usage Examples
// Java side: creating and using a scanner
import org.lance.Dataset;
import org.lance.ipc.LanceScanner;
Dataset dataset = Dataset.open("/path/to/dataset");
LanceScanner scanner = dataset.newScan()
.columns(Arrays.asList("id", "embedding"))
.filter("id > 100")
.limit(1000)
.build();
ArrowReader reader = scanner.scanBatches();
// Rust JNI side: scanner creation pattern
#[no_mangle]
pub extern "system" fn Java_org_lance_ipc_LanceScanner_createScanner<'local>(
mut env: JNIEnv<'local>,
_reader: JObject,
jdataset: JObject,
fragment_ids_obj: JObject,
columns_obj: JObject,
filter_obj: JObject,
// ... additional parameters
) -> JObject<'local> {
ok_or_throw!(env, inner_create_scanner(&mut env, jdataset, /* ... */))
}
Related Pages
- Lance_format_Lance_JNI_BlockingDataset - Dataset from which scanners are created
- Lance_format_Lance_JNI_FFI - JNIEnvExt trait used for type extraction
- Lance_format_Lance_JNI_Utils - Utility functions including
get_queryfor vector search - Lance_format_Lance_JNI_Traits - Type conversion traits used throughout scanner setup