Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance JNI Fragment

From Leeroopedia


Knowledge Sources
Domains Java_Bindings, JNI
Last Updated 2026-02-08 19:33 GMT

Overview

JNI Fragment is the Rust-side JNI binding that exposes Lance fragment operations to Java, including counting rows within a fragment, creating new fragments from Arrow data, and performing fragment-level merge and update operations.

Description

This module provides JNI entry points for the Java Fragment class to interact with Lance data fragments. A fragment is a partition of a Lance dataset that contains a subset of the data. The module supports:

Read operations:

  • Java_org_lance_Fragment_countRowsNative - Counts the number of rows in a specific fragment by fragment ID, delegating to the async fragment.count_rows() method on the Tokio runtime.

Write operations:

  • Java_org_lance_Fragment_createWithFfiArray - Creates new fragments from Arrow data passed via FFI pointers (FFI_ArrowArray and FFI_ArrowSchema). Reconstructs a RecordBatch from the FFI data, then writes it as a new fragment.
  • Java_org_lance_Fragment_createWithFfiStream - Creates new fragments from an Arrow record batch stream passed via FFI_ArrowArrayStream.

Both creation methods accept extensive write parameters including max rows per file, max rows per group, max bytes per file, write mode, stable row IDs flag, data storage version, and storage options (including an optional storage options provider).

Helper types:

  • FragmentMergeResult - Holds a merged fragment and its resulting schema.
  • FragmentUpdateResult - Holds an updated fragment and the list of modified field IDs.

The module relies on the extract_write_params utility to convert Java write parameters into Rust WriteParams, and uses FileFragment::create and StreamingWriteSource from the lance-datafusion crate to perform actual fragment writes.

Usage

Use this module when implementing or extending fragment-level operations in the Java SDK. Fragment creation is used during dataset writes and when building transactions that include new data files.

Code Reference

Source Location

java/lance-jni/src/fragment.rs

Signature

pub(crate) struct FragmentMergeResult {
    fragment: Fragment,
    schema: Schema,
}

pub(crate) struct FragmentUpdateResult {
    updated_fragment: Fragment,
    fields_modified: Vec<u32>,
}

// JNI entry points
pub extern "system" fn Java_org_lance_Fragment_countRowsNative(
    mut env: JNIEnv, _jfragment: JObject,
    jdataset: JObject, fragment_id: jlong,
) -> jint;

pub extern "system" fn Java_org_lance_Fragment_createWithFfiArray<'local>(
    mut env: JNIEnv<'local>, _obj: JObject,
    dataset_uri: JString, arrow_array_addr: jlong, arrow_schema_addr: jlong,
    // ... write parameter objects
) -> JObject<'local>;

pub extern "system" fn Java_org_lance_Fragment_createWithFfiStream<'local>(
    mut env: JNIEnv<'local>, _obj: JObject,
    dataset_uri: JString, arrow_stream_addr: jlong,
    // ... write parameter objects
) -> JObject<'local>;

Import

use crate::fragment::{FragmentMergeResult, FragmentUpdateResult};

I/O Contract

Direction Type Description
Input JObject (Java Dataset) Dataset containing the fragment
Input jlong (fragment_id) ID of the fragment to operate on
Input JString (dataset_uri) URI for creating new fragments
Input jlong (arrow_array_addr) Memory address of an FFI_ArrowArray for write operations
Input jlong (arrow_schema_addr) Memory address of an FFI_ArrowSchema for write operations
Input JObject (write params) Optional write parameters (max rows, mode, storage version, etc.)
Output jint Row count for count operations
Output JObject (Java Fragment list) List of created fragment metadata objects

Usage Examples

// Java side: creating fragments from data
import org.lance.Fragment;

List<Fragment> fragments = Fragment.create(
    "s3://bucket/my-dataset",
    arrowData,
    Optional.of(1024),       // max rows per file
    Optional.empty(),        // max rows per group
    Optional.empty(),        // max bytes per file
    Optional.of("append"),   // write mode
    Optional.empty(),        // enable stable row ids
    Optional.empty(),        // data storage version
    storageOptions
);
// Rust JNI side: count rows in a fragment
fn inner_count_rows_native(
    env: &mut JNIEnv,
    jdataset: JObject,
    fragment_id: jlong,
) -> Result<usize> {
    let dataset = unsafe {
        env.get_rust_field::<_, _, BlockingDataset>(jdataset, NATIVE_DATASET)
    }?;
    let fragment = dataset.inner.get_fragment(fragment_id as usize)
        .ok_or_else(|| Error::input_error(format!("Fragment not found: {fragment_id}")))?;
    let res = RT.block_on(fragment.count_rows(None))?;
    Ok(res)
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment