Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon BlobDescriptor Deserialize

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Blob_Storage
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for deserializing stored blob descriptor bytes back into BlobDescriptor objects.

Description

BlobDescriptor.deserialize() reads the compact binary format produced by serialize() and reconstructs a BlobDescriptor object. The binary layout is parsed as follows:

  1. version (1 byte) -- protocol version, validated against supported versions
  2. uri_length (4 bytes, little-endian) -- length of the URI string in bytes
  3. uri_bytes (variable length) -- UTF-8 encoded URI string
  4. offset (8 bytes, little-endian) -- byte offset within the referenced file
  5. length (8 bytes, little-endian) -- number of bytes to read

The method performs the following validations:

  • Minimum data size -- ensures the input bytes contain at least the fixed-size header fields
  • Version compatibility -- checks that the version byte matches a supported version
  • Data integrity -- validates that the total byte count is consistent with the declared URI length

The standard table read pipeline (to_arrow) returns the blob column as binary values that can be passed directly to deserialize(). FormatBlobReader handles the Lance/blob file format internally, including magic number validation and CRC32 checksum verification, before the serialized bytes reach the caller.

Usage

Use this method after reading a blob-enabled table to reconstruct BlobDescriptor objects. The deserialized descriptors provide uri, offset, and length properties needed for lazy blob loading.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/table/row/blob.py:L67-105

Signature

class BlobDescriptor:
    @classmethod
    def deserialize(cls, data: bytes) -> 'BlobDescriptor':

Import

from pypaimon.table.row.blob import BlobDescriptor

I/O Contract

Inputs

Name Type Required Description
data bytes Yes Serialized blob descriptor bytes retrieved from the blob column of a Paimon table read

Outputs

Name Type Description
BlobDescriptor BlobDescriptor Reconstructed descriptor object with uri, offset, and length properties accessible for subsequent blob loading

Usage Examples

Basic Usage

from pypaimon.table.row.blob import BlobDescriptor

# Read table data using the standard Paimon read pipeline
read_builder = table.new_read_builder()
scan = read_builder.new_scan()
splits = scan.plan().splits()
reader = read_builder.new_read()
arrow_table = reader.to_arrow(splits)

# Deserialize blob descriptors from the blob column
for row_bytes in arrow_table.column('data'):
    descriptor = BlobDescriptor.deserialize(row_bytes.as_py())
    print(f"URI: {descriptor.uri}, Offset: {descriptor.offset}, Size: {descriptor.length}")

Batch Deserialization with Metadata

from pypaimon.table.row.blob import BlobDescriptor

# Read table
read_builder = table.new_read_builder()
scan = read_builder.new_scan()
splits = scan.plan().splits()
reader = read_builder.new_read()
arrow_table = reader.to_arrow(splits)

# Process all rows, combining metadata with deserialized descriptors
ids = arrow_table.column('id')
filenames = arrow_table.column('filename')
blob_column = arrow_table.column('data')

for i in range(len(arrow_table)):
    descriptor = BlobDescriptor.deserialize(blob_column[i].as_py())
    print(f"ID: {ids[i]}, File: {filenames[i]}, URI: {descriptor.uri}, Size: {descriptor.length}")

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment