Implementation:Apache Paimon BlobDescriptor Create and Serialize
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Blob_Storage |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for creating and serializing blob descriptors that reference external binary objects.
Description
BlobDescriptor.__init__() creates a descriptor with URI, offset, length, and version. The serialize() method produces a compact binary representation with the following layout:
- version (1 byte) -- protocol version number
- uri_length (4 bytes, little-endian) -- length of the URI string
- uri_bytes (variable length) -- UTF-8 encoded URI string
- offset (8 bytes, little-endian) -- byte offset within the referenced file
- length (8 bytes, little-endian) -- number of bytes to read
The serialized bytes are stored in the blob column of the Paimon table. FileIO.get_file_size() can determine the file length for the descriptor when the entire file is being referenced.
The class also provides read-only properties (uri, offset, length) for accessing the descriptor attributes after construction.
Usage
Use this class to construct descriptors for each external blob before writing metadata to a Paimon table. Typically one descriptor is created per row in the blob-enabled table.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/table/row/blob.py:L27-65
Signature
class BlobDescriptor:
CURRENT_VERSION = 1
def __init__(self, uri: str, offset: int, length: int, version: int = CURRENT_VERSION):
def serialize(self) -> bytes:
@property
def uri(self) -> str:
@property
def offset(self) -> int:
@property
def length(self) -> int:
Import
from pypaimon.table.row.blob import BlobDescriptor
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| uri | str | Yes | External file URI (e.g., oss://bucket/path/file.mov, s3://bucket/key, or local path) |
| offset | int | Yes | Byte offset within the file (typically 0 for whole-file references) |
| length | int | Yes | File size in bytes (number of bytes to read from offset) |
| version | int | No | Serialization protocol version (defaults to CURRENT_VERSION which is 1) |
Outputs
| Name | Type | Description |
|---|---|---|
| BlobDescriptor | BlobDescriptor | Descriptor object with uri, offset, and length properties |
| serialize() | bytes | Compact binary representation of the descriptor for storage in the Paimon table blob column |
Usage Examples
Basic Usage
from pypaimon.table.row.blob import BlobDescriptor
# Create descriptors for external files
descriptor = BlobDescriptor(
uri='oss://my-bucket/videos/clip001.mov',
offset=0,
length=1048576, # 1MB
)
# Serialize for storage in Paimon table
serialized = descriptor.serialize()
print(f"Serialized: {len(serialized)} bytes")
# Access descriptor properties
print(f"URI: {descriptor.uri}")
print(f"Offset: {descriptor.offset}")
print(f"Length: {descriptor.length}")
Batch Descriptor Creation
from pypaimon.table.row.blob import BlobDescriptor
# Create multiple descriptors for a batch of files
files = [
('oss://bucket/images/photo1.jpg', 0, 204800),
('oss://bucket/images/photo2.jpg', 0, 512000),
('oss://bucket/images/photo3.jpg', 0, 307200),
]
descriptors = [
BlobDescriptor(uri=uri, offset=offset, length=length)
for uri, offset, length in files
]
# Serialize all descriptors for table insertion
serialized_list = [d.serialize() for d in descriptors]