Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Paimon VectorType

From Leeroopedia


Knowledge Sources
Domains Type System, Vector Processing
Last Updated 2026-02-08 00:00 GMT

Overview

VectorType represents a fixed-size vector data type with densely stored elements, designed for machine learning and vector similarity search workloads.

Description

VectorType is a final class extending DataType that represents fixed-length dense vectors, introduced in Paimon version 2.0.0. Unlike ArrayType which supports variable-length collections, VectorType enforces a fixed size specified at type creation, making it suitable for ML embeddings and numerical vectors where dimensionality is constant. The type restricts element types to primitive numeric and boolean types (BOOLEAN, TINYINT, SMALLINT, INTEGER, BIGINT, FLOAT, DOUBLE), validated through the isValidElementType() static method.

The length parameter must be at least MIN_LENGTH (1) with a maximum of Integer.MAX_VALUE, though practical limits depend on storage and memory constraints. The default storage size is computed as element size multiplied by vector length, providing accurate memory estimation for fixed-size vectors. This makes VectorType particularly efficient for vector databases and similarity search scenarios where all vectors share the same dimensionality.

VectorType provides standard DataType operations including nullable and non-nullable construction, deep copying, SQL string generation in format "VECTOR<element_type, length>", JSON serialization with element type and length fields, and visitor pattern support. The class supports comprehensive equality checking including field ID comparisons, but isPrunedFrom() only compares element types, not lengths, allowing flexibility in schema evolution. Available since version 2.0.0 as @Public API.

Usage

Use VectorType for representing ML embeddings, feature vectors, dense numerical arrays with fixed dimensionality, or any use case requiring fixed-size vector storage with efficient similarity search capabilities. It is essential for vector database and AI/ML applications.

Code Reference

Source Location

Signature

@Public
public class VectorType extends DataType {

    public static final int MIN_LENGTH = 1;
    public static final int MAX_LENGTH = Integer.MAX_VALUE;
    public static final String FORMAT = "VECTOR<%s, %d>";

    public VectorType(boolean isNullable, int length, DataType elementType)

    public VectorType(int length, DataType elementType)

    public int getLength()

    public DataType getElementType()

    public static boolean isValidElementType(DataType elementType)

    @Override
    public int defaultSize()

    @Override
    public DataType copy(boolean isNullable)

    @Override
    public String asSQLString()

    @Override
    public <R> R accept(DataTypeVisitor<R> visitor)
}

Import

import org.apache.paimon.types.VectorType;

I/O Contract

Inputs

Name Type Required Description
length int Yes Fixed size of the vector (minimum 1)
elementType DataType Yes Element type (must be numeric or boolean)
isNullable boolean No (default: true) Whether the vector itself can be null

Outputs

Name Type Description
VectorType VectorType Configured fixed-size vector type
Length int The fixed dimensionality of the vector
Element type DataType The type of elements in the vector
Default size int Storage size (element size × length)
SQL string String Format "VECTOR<element_type, length>" with optional "NOT NULL"

Usage Examples

// Standard ML embedding vector (768-dimensional float)
DataType embedding = new VectorType(768, DataTypes.FLOAT());

// Using factory method
DataType vector = DataTypes.VECTOR(128, DataTypes.FLOAT());

// Non-nullable vector
DataType requiredVector = new VectorType(false, 512, DataTypes.DOUBLE());

// Different element types
DataType byteVector = DataTypes.VECTOR(256, DataTypes.TINYINT());
DataType intVector = DataTypes.VECTOR(100, DataTypes.INT());
DataType booleanVector = DataTypes.VECTOR(64, DataTypes.BOOLEAN());

// Check valid element types
boolean validFloat = VectorType.isValidElementType(DataTypes.FLOAT()); // true
boolean validString = VectorType.isValidElementType(DataTypes.STRING()); // false
boolean validArray = VectorType.isValidElementType(
    DataTypes.ARRAY(DataTypes.INT())
); // false

// Access vector properties
VectorType vec = new VectorType(768, DataTypes.FLOAT());
int length = vec.getLength(); // 768
DataType elementType = vec.getElementType(); // FLOAT
int size = vec.defaultSize(); // 768 * 4 = 3072 bytes

// SQL representation
String sql = vec.asSQLString(); // "VECTOR<FLOAT, 768>"

// Use in table schema for ML applications
DataType documentTable = DataTypes.ROW(
    DataTypes.FIELD(0, "doc_id", DataTypes.BIGINT()),
    DataTypes.FIELD(1, "title", DataTypes.STRING()),
    DataTypes.FIELD(2, "content", DataTypes.STRING()),
    DataTypes.FIELD(3, "embedding", DataTypes.VECTOR(768, DataTypes.FLOAT()))
);

// Image feature vectors
DataType imageTable = DataTypes.ROW(
    DataTypes.FIELD(0, "image_id", DataTypes.STRING()),
    DataTypes.FIELD(1, "resnet_features", DataTypes.VECTOR(2048, DataTypes.FLOAT())),
    DataTypes.FIELD(2, "clip_embedding", DataTypes.VECTOR(512, DataTypes.FLOAT()))
);

// Multiple vector representations
DataType multiModalTable = DataTypes.ROW(
    DataTypes.FIELD(0, "id", DataTypes.BIGINT()),
    DataTypes.FIELD(1, "text_embedding", DataTypes.VECTOR(768, DataTypes.FLOAT())),
    DataTypes.FIELD(2, "image_embedding", DataTypes.VECTOR(512, DataTypes.FLOAT())),
    DataTypes.FIELD(3, "audio_embedding", DataTypes.VECTOR(256, DataTypes.FLOAT()))
);

// Validate element type before creating vector
DataType proposedElementType = DataTypes.STRING();
if (VectorType.isValidElementType(proposedElementType)) {
    DataType vec = DataTypes.VECTOR(100, proposedElementType);
} else {
    // Handle invalid element type
    System.err.println("Invalid element type for vector");
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment