Implementation:Apache Paimon BlobDescriptor Deserialize

Knowledge Sources	Apache Paimon
Domains	Data_Lake, Blob_Storage
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for deserializing stored blob descriptor bytes back into BlobDescriptor objects.

Description

BlobDescriptor.deserialize() reads the compact binary format produced by serialize() and reconstructs a BlobDescriptor object. The binary layout is parsed as follows:

version (1 byte) -- protocol version, validated against supported versions
uri_length (4 bytes, little-endian) -- length of the URI string in bytes
uri_bytes (variable length) -- UTF-8 encoded URI string
offset (8 bytes, little-endian) -- byte offset within the referenced file
length (8 bytes, little-endian) -- number of bytes to read

The method performs the following validations:

Minimum data size -- ensures the input bytes contain at least the fixed-size header fields
Version compatibility -- checks that the version byte matches a supported version
Data integrity -- validates that the total byte count is consistent with the declared URI length

The standard table read pipeline (to_arrow) returns the blob column as binary values that can be passed directly to deserialize(). FormatBlobReader handles the Lance/blob file format internally, including magic number validation and CRC32 checksum verification, before the serialized bytes reach the caller.

Usage

Use this method after reading a blob-enabled table to reconstruct BlobDescriptor objects. The deserialized descriptors provide uri, offset, and length properties needed for lazy blob loading.

Code Reference

Source Location

Repository: Apache Paimon
File: paimon-python/pypaimon/table/row/blob.py:L67-105

Signature

class BlobDescriptor:
    @classmethod
    def deserialize(cls, data: bytes) -> 'BlobDescriptor':

Import

from pypaimon.table.row.blob import BlobDescriptor

I/O Contract

Inputs

Name	Type	Required	Description
data	bytes	Yes	Serialized blob descriptor bytes retrieved from the blob column of a Paimon table read

Outputs

Name	Type	Description
BlobDescriptor	BlobDescriptor	Reconstructed descriptor object with uri, offset, and length properties accessible for subsequent blob loading

Usage Examples

Basic Usage

from pypaimon.table.row.blob import BlobDescriptor

# Read table data using the standard Paimon read pipeline
read_builder = table.new_read_builder()
scan = read_builder.new_scan()
splits = scan.plan().splits()
reader = read_builder.new_read()
arrow_table = reader.to_arrow(splits)

# Deserialize blob descriptors from the blob column
for row_bytes in arrow_table.column('data'):
    descriptor = BlobDescriptor.deserialize(row_bytes.as_py())
    print(f"URI: {descriptor.uri}, Offset: {descriptor.offset}, Size: {descriptor.length}")

Batch Deserialization with Metadata

from pypaimon.table.row.blob import BlobDescriptor

# Read table
read_builder = table.new_read_builder()
scan = read_builder.new_scan()
splits = scan.plan().splits()
reader = read_builder.new_read()
arrow_table = reader.to_arrow(splits)

# Process all rows, combining metadata with deserialized descriptors
ids = arrow_table.column('id')
filenames = arrow_table.column('filename')
blob_column = arrow_table.column('data')

for i in range(len(arrow_table)):
    descriptor = BlobDescriptor.deserialize(blob_column[i].as_py())
    print(f"ID: {ids[i]}, File: {filenames[i]}, URI: {descriptor.uri}, Size: {descriptor.length}")

Related Pages

Implements Principle

Principle:Apache_Paimon_Blob_Descriptor_Deserialization

Requires Environment

Environment:Apache_Paimon_Python_Core_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment