Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance JNI BlockingScanner

From Leeroopedia
Revision as of 15:27, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Lance_format_Lance_JNI_BlockingScanner.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Java_Bindings, JNI
Last Updated 2026-02-08 19:33 GMT

Overview

JNI BlockingScanner is the Rust-side JNI binding that wraps the Lance Scanner, providing synchronous data scanning with support for column projection, filtering, vector search, full-text search, and batch streaming to Java.

Description

The BlockingScanner struct wraps an Arc<Scanner> and provides blocking access to the Lance dataset scanning pipeline. It supports:

  • Stream opening via open_stream, which blocks on the async try_into_stream to produce a DatasetRecordBatchStream.
  • Schema retrieval to get the output schema of the scan.
  • Row counting to count rows matching the scan criteria.

The JNI entry point Java_org_lance_ipc_LanceScanner_createScanner constructs a scanner from a BlockingDataset with extensive configuration:

  • Column projection via an optional list of column names.
  • Filtering via SQL-style filter strings or Substrait filter expressions.
  • Batch size, limit, and offset for pagination control.
  • Vector search via a Query object specifying column, key vector, k, nprobes, ef, refine factor, and distance type.
  • Full-text search via a structured FullTextQuery supporting Match, MatchPhrase, MultiMatch, Boost, and Boolean query types.
  • Row ID and row address inclusion flags.
  • Column orderings for sorted scans.
  • Fragment filtering to scan only specific fragments.

The full-text search query builder (build_full_text_search_query) recursively constructs an FtsQuery tree from Java objects, supporting nested boolean and boost compositions.

Usage

Use this module when implementing or extending the Java dataset scanning API. It is invoked from the Java LanceScanner class to create configured scanners and stream Arrow record batches back to Java via FFI.

Code Reference

Source Location

java/lance-jni/src/blocking_scanner.rs

Signature

pub struct BlockingScanner {
    pub(crate) inner: Arc<Scanner>,
}

impl BlockingScanner {
    pub fn create(scanner: Scanner) -> Self;
    pub fn open_stream(&self) -> Result<DatasetRecordBatchStream>;
    pub fn schema(&self) -> Result<SchemaRef>;
    pub fn count_rows(&self) -> Result<u64>;
}

Import

use crate::blocking_scanner::{BlockingScanner, NATIVE_SCANNER};

I/O Contract

Direction Type Description
Input JObject (Java Dataset) The dataset to scan, carrying a native BlockingDataset handle
Input JObject (Optional columns) Java Optional<List<String>> for column projection
Input JObject (Optional filter) Java Optional<String> SQL filter expression
Input JObject (Optional Query) Java Optional<Query> for vector nearest-neighbor search
Input JObject (Optional FullTextQuery) Java Optional<FullTextQuery> for full-text search
Input jboolean (with_row_id) Whether to include row IDs in results
Input jint (batch_readahead) Number of batches to read ahead
Output JObject (Java LanceScanner) Java scanner object with native handle attached
Output Arrow FFI stream Record batch stream exported via FFI_ArrowArrayStream

Usage Examples

// Java side: creating and using a scanner
import org.lance.Dataset;
import org.lance.ipc.LanceScanner;

Dataset dataset = Dataset.open("/path/to/dataset");
LanceScanner scanner = dataset.newScan()
    .columns(Arrays.asList("id", "embedding"))
    .filter("id > 100")
    .limit(1000)
    .build();

ArrowReader reader = scanner.scanBatches();
// Rust JNI side: scanner creation pattern
#[no_mangle]
pub extern "system" fn Java_org_lance_ipc_LanceScanner_createScanner<'local>(
    mut env: JNIEnv<'local>,
    _reader: JObject,
    jdataset: JObject,
    fragment_ids_obj: JObject,
    columns_obj: JObject,
    filter_obj: JObject,
    // ... additional parameters
) -> JObject<'local> {
    ok_or_throw!(env, inner_create_scanner(&mut env, jdataset, /* ... */))
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment