Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Apache Paimon MemorySliceInput

From Leeroopedia
Revision as of 14:21, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Apache_Paimon_MemorySliceInput.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Binary I/O, Serialization
Last Updated 2026-02-08 00:00 GMT

Overview

MemorySliceInput provides efficient sequential and random access reading of binary data from in-memory byte arrays.

Description

MemorySliceInput is a utility class for reading various data types from a byte array with position tracking. It acts as a cursor-based reader over byte data, maintaining an internal position that advances as data is read. The class supports both fixed-size and variable-length integer encoding, as well as raw byte slice extraction.

The reader supports multiple integer formats: single bytes, 4-byte big-endian integers, 8-byte big-endian longs, and variable-length integers using the VarInt encoding scheme. Variable-length integers use 7 bits per byte for data with the 8th bit as a continuation flag, providing compact representation for small values while supporting full 32-bit or 64-bit ranges.

Position management is a key feature. The position() method returns the current read position, set_position() allows jumping to arbitrary positions (supporting random access), and available() returns remaining bytes. The is_readable() method checks if more data is available, useful for iteration loops.

The class performs bounds checking on all operations, raising IndexError if attempting to read beyond the available data. This prevents buffer overruns and makes it safe to use with untrusted input when combined with proper error handling.

Usage

MemorySliceInput is used throughout the B-tree index implementation to parse binary structures like block entries, handles, and footers. It provides a clean abstraction over raw byte manipulation and is particularly useful when reading structured data with mixed fixed and variable-length fields.

Code Reference

Source Location

Signature

class MemorySliceInput:
    """Input for byte array."""

    def __init__(self, data: bytes):
        """Initialize with byte array."""
        ...

    def position(self) -> int:
        """Get current position."""
        ...

    def set_position(self, position: int) -> None:
        """Set current position."""
        ...

    def is_readable(self) -> bool:
        """Check if there are more bytes to read."""
        ...

    def available(self) -> int:
        """Get number of available bytes."""
        ...

    def read_byte(self) -> int:
        """Read a single byte."""
        ...

    def read_int(self) -> int:
        """Read a 4-byte integer (big-endian)."""
        ...

    def read_long(self) -> int:
        """Read an 8-byte long (big-endian)."""
        ...

    def read_var_len_int(self) -> int:
        """Read a variable-length integer."""
        ...

    def read_var_len_long(self) -> int:
        """Read a variable-length long."""
        ...

    def read_slice(self, length: int) -> bytes:
        """Read a slice of bytes."""
        ...

Import

from pypaimon.globalindex.btree.memory_slice_input import MemorySliceInput

I/O Contract

Inputs

Name Type Required Description
data bytes Yes Byte array to read from
position int No Position to seek to (for set_position)
length int No Number of bytes to read (for read_slice)

Outputs

Name Type Description
position int Current read position
byte_value int Single byte (0-255)
int_value int 4-byte integer
long_value int 8-byte long integer
var_int int Variable-length integer
slice bytes Byte slice of specified length

Usage Examples

from pypaimon.globalindex.btree.memory_slice_input import MemorySliceInput

# Create input from byte data
data = b'\x01\x00\x00\x00\x05\x82\x03hello'
input_stream = MemorySliceInput(data)

# Read fixed-size integer (big-endian)
value1 = input_stream.read_int()
print(f"Integer: {value1}")  # 16777216 (0x01000000)

# Read variable-length integer
# 0x82 0x03 = 10000010 00000011 = (0000010) (0000011) = 2 + (3 << 7) = 386
var_int = input_stream.read_var_len_int()
print(f"VarInt: {var_int}")  # 386

# Read byte slice
text_bytes = input_stream.read_slice(5)
print(f"Text: {text_bytes.decode()}")  # "hello"

# Check remaining data
print(f"Remaining: {input_stream.available()}")  # 0

# Random access - seek back
input_stream.set_position(0)
first_byte = input_stream.read_byte()
print(f"First byte: {first_byte}")  # 1

# Parse structured data
def parse_entry(data: bytes):
    input_stream = MemorySliceInput(data)

    # Read key
    key_len = input_stream.read_var_len_int()
    key = input_stream.read_slice(key_len)

    # Read value
    value_len = input_stream.read_var_len_int()
    value = input_stream.read_slice(value_len)

    return key, value

# Iterate until end
input_stream = MemorySliceInput(block_data)
entries = []
while input_stream.is_readable():
    key, value = parse_entry_at_position(input_stream)
    entries.append((key, value))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment