Implementation:Apache Paimon VectorType
| Knowledge Sources | |
|---|---|
| Domains | Type System, Vector Processing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
VectorType represents a fixed-size vector data type with densely stored elements, designed for machine learning and vector similarity search workloads.
Description
VectorType is a final class extending DataType that represents fixed-length dense vectors, introduced in Paimon version 2.0.0. Unlike ArrayType which supports variable-length collections, VectorType enforces a fixed size specified at type creation, making it suitable for ML embeddings and numerical vectors where dimensionality is constant. The type restricts element types to primitive numeric and boolean types (BOOLEAN, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE), validated through the isValidElementType() static method.
The length parameter must be at least MIN_LENGTH (1) with a maximum of Integer.MAX_VALUE, though practical limits depend on storage and memory constraints. The default storage size is computed as element size multiplied by vector length, providing accurate memory estimation for fixed-size vectors. This makes VectorType particularly efficient for vector databases and similarity search scenarios where all vectors share the same dimensionality.
VectorType provides standard DataType operations including nullable and non-nullable construction, deep copying, SQL string generation in format "VECTOR<element_type, length>", JSON serialization with element type and length fields, and visitor pattern support. The class supports comprehensive equality checking including field ID comparisons, but isPrunedFrom() only compares element types, not lengths, allowing flexibility in schema evolution. Available since version 2.0.0 as @Public API.
Usage
Use VectorType for representing ML embeddings, feature vectors, dense numerical arrays with fixed dimensionality, or any use case requiring fixed-size vector storage with efficient similarity search capabilities. It is essential for vector database and AI/ML applications.
Code Reference
Source Location
Signature
@Public
public class VectorType extends DataType {
public static final int MIN_LENGTH = 1;
public static final int MAX_LENGTH = Integer.MAX_VALUE;
public static final String FORMAT = "VECTOR<%s, %d>";
public VectorType(boolean isNullable, int length, DataType elementType)
public VectorType(int length, DataType elementType)
public int getLength()
public DataType getElementType()
public static boolean isValidElementType(DataType elementType)
@Override
public int defaultSize()
@Override
public DataType copy(boolean isNullable)
@Override
public String asSQLString()
@Override
public <R> R accept(DataTypeVisitor<R> visitor)
}
Import
import org.apache.paimon.types.VectorType;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| length | int | Yes | Fixed size of the vector (minimum 1) |
| elementType | DataType | Yes | Element type (must be numeric or boolean) |
| isNullable | boolean | No (default: true) | Whether the vector itself can be null |
Outputs
| Name | Type | Description |
|---|---|---|
| VectorType | VectorType | Configured fixed-size vector type |
| Length | int | The fixed dimensionality of the vector |
| Element type | DataType | The type of elements in the vector |
| Default size | int | Storage size (element size × length) |
| SQL string | String | Format "VECTOR<element_type, length>" with optional "NOT NULL" |
Usage Examples
// Standard ML embedding vector (768-dimensional float)
DataType embedding = new VectorType(768, DataTypes.FLOAT());
// Using factory method
DataType vector = DataTypes.VECTOR(128, DataTypes.FLOAT());
// Non-nullable vector
DataType requiredVector = new VectorType(false, 512, DataTypes.DOUBLE());
// Different element types
DataType byteVector = DataTypes.VECTOR(256, DataTypes.TINYINT());
DataType intVector = DataTypes.VECTOR(100, DataTypes.INT());
DataType booleanVector = DataTypes.VECTOR(64, DataTypes.BOOLEAN());
// Check valid element types
boolean validFloat = VectorType.isValidElementType(DataTypes.FLOAT()); // true
boolean validString = VectorType.isValidElementType(DataTypes.STRING()); // false
boolean validArray = VectorType.isValidElementType(
DataTypes.ARRAY(DataTypes.INT())
); // false
// Access vector properties
VectorType vec = new VectorType(768, DataTypes.FLOAT());
int length = vec.getLength(); // 768
DataType elementType = vec.getElementType(); // FLOAT
int size = vec.defaultSize(); // 768 * 4 = 3072 bytes
// SQL representation
String sql = vec.asSQLString(); // "VECTOR<FLOAT, 768>"
// Use in table schema for ML applications
DataType documentTable = DataTypes.ROW(
DataTypes.FIELD(0, "doc_id", DataTypes.BIGINT()),
DataTypes.FIELD(1, "title", DataTypes.STRING()),
DataTypes.FIELD(2, "content", DataTypes.STRING()),
DataTypes.FIELD(3, "embedding", DataTypes.VECTOR(768, DataTypes.FLOAT()))
);
// Image feature vectors
DataType imageTable = DataTypes.ROW(
DataTypes.FIELD(0, "image_id", DataTypes.STRING()),
DataTypes.FIELD(1, "resnet_features", DataTypes.VECTOR(2048, DataTypes.FLOAT())),
DataTypes.FIELD(2, "clip_embedding", DataTypes.VECTOR(512, DataTypes.FLOAT()))
);
// Multiple vector representations
DataType multiModalTable = DataTypes.ROW(
DataTypes.FIELD(0, "id", DataTypes.BIGINT()),
DataTypes.FIELD(1, "text_embedding", DataTypes.VECTOR(768, DataTypes.FLOAT())),
DataTypes.FIELD(2, "image_embedding", DataTypes.VECTOR(512, DataTypes.FLOAT())),
DataTypes.FIELD(3, "audio_embedding", DataTypes.VECTOR(256, DataTypes.FLOAT()))
);
// Validate element type before creating vector
DataType proposedElementType = DataTypes.STRING();
if (VectorType.isValidElementType(proposedElementType)) {
DataType vec = DataTypes.VECTOR(100, proposedElementType);
} else {
// Handle invalid element type
System.err.println("Invalid element type for vector");
}