Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Lance format Lance Java Fragment

From Leeroopedia


Knowledge Sources
Domains Java_SDK, Dataset_Management
Last Updated 2026-02-08 19:33 GMT

Overview

The Fragment class provides operations on individual fragments (data partitions) within a Lance dataset, including scanning, row deletion, column merging, column updating, and fragment creation.

Description

Fragment represents a single fragment within a Lance dataset. Fragments are the fundamental storage units in the Lance format, each containing a subset of the dataset's rows. This class enables fine-grained operations at the fragment level rather than the whole-dataset level:

  • Scanning a single fragment with configurable ScanOptions
  • Deleting rows by row indexes within the fragment
  • Merging columns from an external Arrow stream into the fragment via left-join semantics
  • Updating columns in the fragment via left-outer-hash-join semantics
  • Counting rows in the fragment
  • Writing new fragments via the Fragment.write() builder

The class wraps a FragmentMetadata object and holds a reference back to the parent Dataset for I/O operations. Fragment instances are typically obtained from Dataset.getFragments() or Dataset.getFragment(int).

Usage

Use the Fragment class when you need to:

  • Scan or operate on a specific subset of a dataset rather than the full dataset
  • Delete specific rows within a fragment by their row indexes
  • Merge new columns into a fragment using join semantics
  • Update existing columns in a fragment using join semantics
  • Create new fragments from Arrow data for later commit via transactions

Code Reference

Source Location

Property Value
File java/src/main/java/org/lance/Fragment.java
Package org.lance
Lines 368

Signature

public class Fragment

Import

import org.lance.Fragment;

I/O Contract

Constructors

Constructor Input Description
Fragment(Dataset, int) dataset, fragmentId Creates a fragment by looking up the fragment ID from the dataset
Fragment(Dataset, FragmentMetadata) dataset, metadata Creates a fragment from existing metadata

Key Methods

Method Input Output Description
write() (static) none WriteFragmentBuilder Returns a builder for creating new fragments
newScan() none or ScanOptions or long LanceScanner Creates a scanner scoped to this fragment
deleteRows(List<Integer>) rowIndexes FragmentMetadata (nullable) Deletes rows; returns null if all rows deleted
mergeColumns(ArrowArrayStream, String, String) stream, leftOn, rightOn FragmentMergeResult Merges new columns via left-join
updateColumns(ArrowArrayStream, String, String) stream, leftOn, rightOn FragmentUpdateResult Updates columns via left-outer-hash-join
countRows() none int Returns the number of rows in this fragment
getId() none int Returns the fragment ID
metadata() none FragmentMetadata Returns the fragment metadata

Usage Examples

Scanning a Specific Fragment

import org.lance.Dataset;
import org.lance.Fragment;
import org.lance.ipc.LanceScanner;
import org.lance.ipc.ScanOptions;
import java.util.Arrays;

Dataset dataset = Dataset.open().uri("/path/to/dataset.lance").build();
List<Fragment> fragments = dataset.getFragments();

// Scan the first fragment with specific columns
Fragment fragment = fragments.get(0);
ScanOptions options = new ScanOptions.Builder()
    .columns(Arrays.asList("id", "name"))
    .batchSize(1024)
    .build();

try (LanceScanner scanner = fragment.newScan(options)) {
    // Process batches from the fragment
}

Creating New Fragments

import org.lance.Fragment;
import org.lance.FragmentMetadata;
import org.apache.arrow.vector.VectorSchemaRoot;

List<FragmentMetadata> newFragments = Fragment.write()
    .datasetUri("s3://bucket/dataset.lance")
    .allocator(allocator)
    .data(vectorSchemaRoot)
    .storageOptions(storageOptions)
    .execute();

Merging Columns into a Fragment

import org.lance.fragment.FragmentMergeResult;
import org.apache.arrow.c.ArrowArrayStream;

// Merge new columns into the fragment via left-join on "id"
FragmentMergeResult result = fragment.mergeColumns(
    arrowStream,  // stream with new column data
    "id",         // left join column (fragment)
    "id"          // right join column (stream)
);

Deleting Rows from a Fragment

import org.lance.FragmentMetadata;
import java.util.Arrays;

// Delete rows at indexes 0, 5, and 10
FragmentMetadata updated = fragment.deleteRows(Arrays.asList(0, 5, 10));
if (updated == null) {
    // All rows in the fragment were deleted
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment