Implementation:Lance format Lance Java Fragment
| Knowledge Sources | |
|---|---|
| Domains | Java_SDK, Dataset_Management |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
The Fragment class provides operations on individual fragments (data partitions) within a Lance dataset, including scanning, row deletion, column merging, column updating, and fragment creation.
Description
Fragment represents a single fragment within a Lance dataset. Fragments are the fundamental storage units in the Lance format, each containing a subset of the dataset's rows. This class enables fine-grained operations at the fragment level rather than the whole-dataset level:
- Scanning a single fragment with configurable
ScanOptions - Deleting rows by row indexes within the fragment
- Merging columns from an external Arrow stream into the fragment via left-join semantics
- Updating columns in the fragment via left-outer-hash-join semantics
- Counting rows in the fragment
- Writing new fragments via the
Fragment.write()builder
The class wraps a FragmentMetadata object and holds a reference back to the parent Dataset for I/O operations. Fragment instances are typically obtained from Dataset.getFragments() or Dataset.getFragment(int).
Usage
Use the Fragment class when you need to:
- Scan or operate on a specific subset of a dataset rather than the full dataset
- Delete specific rows within a fragment by their row indexes
- Merge new columns into a fragment using join semantics
- Update existing columns in a fragment using join semantics
- Create new fragments from Arrow data for later commit via transactions
Code Reference
Source Location
| Property | Value |
|---|---|
| File | java/src/main/java/org/lance/Fragment.java
|
| Package | org.lance
|
| Lines | 368 |
Signature
public class Fragment
Import
import org.lance.Fragment;
I/O Contract
Constructors
| Constructor | Input | Description |
|---|---|---|
Fragment(Dataset, int) |
dataset, fragmentId | Creates a fragment by looking up the fragment ID from the dataset |
Fragment(Dataset, FragmentMetadata) |
dataset, metadata | Creates a fragment from existing metadata |
Key Methods
| Method | Input | Output | Description |
|---|---|---|---|
write() (static) |
none | WriteFragmentBuilder |
Returns a builder for creating new fragments |
newScan() |
none or ScanOptions or long |
LanceScanner |
Creates a scanner scoped to this fragment |
deleteRows(List<Integer>) |
rowIndexes | FragmentMetadata (nullable) |
Deletes rows; returns null if all rows deleted |
mergeColumns(ArrowArrayStream, String, String) |
stream, leftOn, rightOn | FragmentMergeResult |
Merges new columns via left-join |
updateColumns(ArrowArrayStream, String, String) |
stream, leftOn, rightOn | FragmentUpdateResult |
Updates columns via left-outer-hash-join |
countRows() |
none | int |
Returns the number of rows in this fragment |
getId() |
none | int |
Returns the fragment ID |
metadata() |
none | FragmentMetadata |
Returns the fragment metadata |
Usage Examples
Scanning a Specific Fragment
import org.lance.Dataset;
import org.lance.Fragment;
import org.lance.ipc.LanceScanner;
import org.lance.ipc.ScanOptions;
import java.util.Arrays;
Dataset dataset = Dataset.open().uri("/path/to/dataset.lance").build();
List<Fragment> fragments = dataset.getFragments();
// Scan the first fragment with specific columns
Fragment fragment = fragments.get(0);
ScanOptions options = new ScanOptions.Builder()
.columns(Arrays.asList("id", "name"))
.batchSize(1024)
.build();
try (LanceScanner scanner = fragment.newScan(options)) {
// Process batches from the fragment
}
Creating New Fragments
import org.lance.Fragment;
import org.lance.FragmentMetadata;
import org.apache.arrow.vector.VectorSchemaRoot;
List<FragmentMetadata> newFragments = Fragment.write()
.datasetUri("s3://bucket/dataset.lance")
.allocator(allocator)
.data(vectorSchemaRoot)
.storageOptions(storageOptions)
.execute();
Merging Columns into a Fragment
import org.lance.fragment.FragmentMergeResult;
import org.apache.arrow.c.ArrowArrayStream;
// Merge new columns into the fragment via left-join on "id"
FragmentMergeResult result = fragment.mergeColumns(
arrowStream, // stream with new column data
"id", // left join column (fragment)
"id" // right join column (stream)
);
Deleting Rows from a Fragment
import org.lance.FragmentMetadata;
import java.util.Arrays;
// Delete rows at indexes 0, 5, and 10
FragmentMetadata updated = fragment.deleteRows(Arrays.asList(0, 5, 10));
if (updated == null) {
// All rows in the fragment were deleted
}
Related Pages
- Lance_format_Lance_Java_Dataset - Parent dataset class that contains fragments
- Lance_format_Lance_Java_ScanOptions - Scan configuration options used with fragment scans
- Lance_format_Lance_Java_WriteDatasetBuilder - Alternative for writing entire datasets
- Heuristic:Lance_format_Lance_Warning_Deprecated_Java_APIs