Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon BlobDescriptor Create and Serialize

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Blob_Storage
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for creating and serializing blob descriptors that reference external binary objects.

Description

BlobDescriptor.__init__() creates a descriptor with URI, offset, length, and version. The serialize() method produces a compact binary representation with the following layout:

  • version (1 byte) -- protocol version number
  • uri_length (4 bytes, little-endian) -- length of the URI string
  • uri_bytes (variable length) -- UTF-8 encoded URI string
  • offset (8 bytes, little-endian) -- byte offset within the referenced file
  • length (8 bytes, little-endian) -- number of bytes to read

The serialized bytes are stored in the blob column of the Paimon table. FileIO.get_file_size() can determine the file length for the descriptor when the entire file is being referenced.

The class also provides read-only properties (uri, offset, length) for accessing the descriptor attributes after construction.

Usage

Use this class to construct descriptors for each external blob before writing metadata to a Paimon table. Typically one descriptor is created per row in the blob-enabled table.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/table/row/blob.py:L27-65

Signature

class BlobDescriptor:
    CURRENT_VERSION = 1

    def __init__(self, uri: str, offset: int, length: int, version: int = CURRENT_VERSION):

    def serialize(self) -> bytes:

    @property
    def uri(self) -> str:
    @property
    def offset(self) -> int:
    @property
    def length(self) -> int:

Import

from pypaimon.table.row.blob import BlobDescriptor

I/O Contract

Inputs

Name Type Required Description
uri str Yes External file URI (e.g., oss://bucket/path/file.mov, s3://bucket/key, or local path)
offset int Yes Byte offset within the file (typically 0 for whole-file references)
length int Yes File size in bytes (number of bytes to read from offset)
version int No Serialization protocol version (defaults to CURRENT_VERSION which is 1)

Outputs

Name Type Description
BlobDescriptor BlobDescriptor Descriptor object with uri, offset, and length properties
serialize() bytes Compact binary representation of the descriptor for storage in the Paimon table blob column

Usage Examples

Basic Usage

from pypaimon.table.row.blob import BlobDescriptor

# Create descriptors for external files
descriptor = BlobDescriptor(
    uri='oss://my-bucket/videos/clip001.mov',
    offset=0,
    length=1048576,  # 1MB
)

# Serialize for storage in Paimon table
serialized = descriptor.serialize()
print(f"Serialized: {len(serialized)} bytes")

# Access descriptor properties
print(f"URI: {descriptor.uri}")
print(f"Offset: {descriptor.offset}")
print(f"Length: {descriptor.length}")

Batch Descriptor Creation

from pypaimon.table.row.blob import BlobDescriptor

# Create multiple descriptors for a batch of files
files = [
    ('oss://bucket/images/photo1.jpg', 0, 204800),
    ('oss://bucket/images/photo2.jpg', 0, 512000),
    ('oss://bucket/images/photo3.jpg', 0, 307200),
]

descriptors = [
    BlobDescriptor(uri=uri, offset=offset, length=length)
    for uri, offset, length in files
]

# Serialize all descriptors for table insertion
serialized_list = [d.serialize() for d in descriptors]

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment