Implementation:Lance format Lance Java MergeOp
| Knowledge Sources | |
|---|---|
| Domains | Java_SDK, Dataset_Management |
| Last Updated | 2026-02-08 19:33 GMT |
Overview
Description
The Merge class is an immutable operation that combines new data fragments with a specified schema to enable schema evolution and column modifications. It extends SchemaOperation (which implements Operation), inheriting schema management capabilities including Arrow C Data Interface export for JNI communication with the Rust backend.
The operation carries both a list of FragmentMetadata objects (representing the new or updated data) and an Arrow Schema describing the target schema after the merge. This dual payload enables adding new columns to an existing dataset by providing fragments containing the new column data alongside the updated schema.
Usage
Use Merge for schema evolution scenarios such as adding new columns to an existing dataset. The fragments contain the new column data, and the schema describes the complete post-merge schema. This is the primary mechanism for column-level modifications in the Lance Java SDK.
Code Reference
Source Location
java/src/main/java/org/lance/operation/Merge.java
Signature
public class Merge extends SchemaOperation {
public static Builder builder();
public List<FragmentMetadata> fragments();
public Schema schema(); // inherited from SchemaOperation
public long exportSchema(BufferAllocator allocator); // inherited
public String name(); // returns "Merge"
}
Import
import org.lance.operation.Merge;
I/O Contract
| Parameter | Type | Required | Description |
|---|---|---|---|
| fragments | List<FragmentMetadata> |
Yes | Fragment metadata for the new or updated data |
| schema | org.apache.arrow.vector.types.pojo.Schema |
Yes | The target Arrow schema after the merge |
| Return | Type | Description |
|---|---|---|
| fragments() | List<FragmentMetadata> |
The fragment metadata for the merged data |
| schema() | Schema |
The target schema |
| exportSchema(allocator) | long |
Memory address of the exported Arrow C schema for JNI |
| name() | String |
Returns "Merge" for JNI dispatch
|
Usage Examples
// Add a new "embedding" column to an existing dataset
Schema updatedSchema = new Schema(List.of(
existingField1, existingField2,
new Field("embedding", FieldType.nullable(new ArrowType.FixedSizeList(128)), List.of(itemField))
));
List<FragmentMetadata> newColumnFragments = writeEmbeddingColumn(data);
Merge mergeOp = Merge.builder()
.fragments(newColumnFragments)
.schema(updatedSchema)
.build();
String opName = mergeOp.name(); // "Merge"
Related Pages
- Lance_format_Lance_Java_OverwriteOp -- Overwrites the entire dataset with new fragments and schema
- Lance_format_Lance_Java_ProjectOp -- Changes the schema without modifying data
- Lance_format_Lance_Java_AppendOp -- Appends new rows without schema changes