Implementation:Apache Paimon BlobDescriptor Deserialize
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Blob_Storage |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for deserializing stored blob descriptor bytes back into BlobDescriptor objects.
Description
BlobDescriptor.deserialize() reads the compact binary format produced by serialize() and reconstructs a BlobDescriptor object. The binary layout is parsed as follows:
- version (1 byte) -- protocol version, validated against supported versions
- uri_length (4 bytes, little-endian) -- length of the URI string in bytes
- uri_bytes (variable length) -- UTF-8 encoded URI string
- offset (8 bytes, little-endian) -- byte offset within the referenced file
- length (8 bytes, little-endian) -- number of bytes to read
The method performs the following validations:
- Minimum data size -- ensures the input bytes contain at least the fixed-size header fields
- Version compatibility -- checks that the version byte matches a supported version
- Data integrity -- validates that the total byte count is consistent with the declared URI length
The standard table read pipeline (to_arrow) returns the blob column as binary values that can be passed directly to deserialize(). FormatBlobReader handles the Lance/blob file format internally, including magic number validation and CRC32 checksum verification, before the serialized bytes reach the caller.
Usage
Use this method after reading a blob-enabled table to reconstruct BlobDescriptor objects. The deserialized descriptors provide uri, offset, and length properties needed for lazy blob loading.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/table/row/blob.py:L67-105
Signature
class BlobDescriptor:
@classmethod
def deserialize(cls, data: bytes) -> 'BlobDescriptor':
Import
from pypaimon.table.row.blob import BlobDescriptor
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | bytes | Yes | Serialized blob descriptor bytes retrieved from the blob column of a Paimon table read |
Outputs
| Name | Type | Description |
|---|---|---|
| BlobDescriptor | BlobDescriptor | Reconstructed descriptor object with uri, offset, and length properties accessible for subsequent blob loading |
Usage Examples
Basic Usage
from pypaimon.table.row.blob import BlobDescriptor
# Read table data using the standard Paimon read pipeline
read_builder = table.new_read_builder()
scan = read_builder.new_scan()
splits = scan.plan().splits()
reader = read_builder.new_read()
arrow_table = reader.to_arrow(splits)
# Deserialize blob descriptors from the blob column
for row_bytes in arrow_table.column('data'):
descriptor = BlobDescriptor.deserialize(row_bytes.as_py())
print(f"URI: {descriptor.uri}, Offset: {descriptor.offset}, Size: {descriptor.length}")
Batch Deserialization with Metadata
from pypaimon.table.row.blob import BlobDescriptor
# Read table
read_builder = table.new_read_builder()
scan = read_builder.new_scan()
splits = scan.plan().splits()
reader = read_builder.new_read()
arrow_table = reader.to_arrow(splits)
# Process all rows, combining metadata with deserialized descriptors
ids = arrow_table.column('id')
filenames = arrow_table.column('filename')
blob_column = arrow_table.column('data')
for i in range(len(arrow_table)):
descriptor = BlobDescriptor.deserialize(blob_column[i].as_py())
print(f"ID: {ids[i]}, File: {filenames[i]}, URI: {descriptor.uri}, Size: {descriptor.length}")