Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon FormatReadBuilder

From Leeroopedia


Knowledge Sources
Domains Format Tables, Query Building
Last Updated 2026-02-08 00:00 GMT

Overview

FormatReadBuilder is a builder class for constructing read operations on format tables with support for filtering, projection, and limits.

Description

The FormatReadBuilder class provides a fluent API for configuring read operations on Apache Paimon format tables. It allows users to specify projections (column selection), filters (partition pruning), and limits (row count restrictions) before creating scan and read objects.

The builder supports partition filtering through predicates or direct partition specification dictionaries. When a predicate is provided, it attempts to extract partition specifications for efficient partition pruning. The builder maintains configuration state and creates configured FormatTableScan and FormatTableRead instances.

The class also provides access to predicate builders for constructing complex filter expressions and exposes the read type (list of projected fields) for query planning purposes.

Usage

Use FormatReadBuilder when reading from format tables (Parquet, ORC, CSV, JSON, Text) with specific column projections, partition filters, or row limits to optimize query performance and reduce data transfer.

Code Reference

Source Location

Signature

class FormatReadBuilder:
    """Builder for constructing format table read operations."""

    def __init__(self, table: FormatTable):
        """Initialize with a FormatTable instance."""

    def with_filter(self, predicate: Predicate) -> "FormatReadBuilder":
        """Set partition filter from predicate."""

    def with_projection(self, projection: List[str]) -> "FormatReadBuilder":
        """Set column projection."""

    def with_limit(self, limit: int) -> "FormatReadBuilder":
        """Set row limit."""

    def with_partition_filter(self, partition_spec: Optional[dict]) -> "FormatReadBuilder":
        """Set partition filter directly."""

    def new_scan(self) -> FormatTableScan:
        """Create a new scan with current configuration."""

    def new_read(self) -> FormatTableRead:
        """Create a new read with current configuration."""

    def new_predicate_builder(self) -> PredicateBuilder:
        """Create a new predicate builder."""

    def read_type(self) -> List[DataField]:
        """Get the list of fields that will be read."""

Import

from pypaimon.table.format.format_read_builder import FormatReadBuilder

I/O Contract

Inputs

Name Type Required Description
table FormatTable Yes Format table to read from
predicate Predicate No Filter predicate for partition pruning
projection List[str] No List of column names to project
limit int No Maximum number of rows to return
partition_spec dict No Direct partition specification for filtering

Outputs

Name Type Description
builder FormatReadBuilder Builder with updated configuration
scan FormatTableScan Configured scan object
read FormatTableRead Configured read object
predicate_builder PredicateBuilder New predicate builder
fields List[DataField] Fields to be read after projection

Usage Examples

from pypaimon.table.format.format_read_builder import FormatReadBuilder

# Create read builder
builder = table.new_read_builder()

# Configure with projection and limit
builder = (builder
    .with_projection(["id", "name", "age"])
    .with_limit(1000))

# Create scan and read
scan = builder.new_scan()
read = builder.new_read()

# Execute query
splits = scan.plan().splits()
df = read.to_pandas(splits)

# With partition filter using predicate
predicate_builder = builder.new_predicate_builder()
predicate = predicate_builder.equal("date", "2024-01-01")
builder = builder.with_filter(predicate)

# With direct partition filter
builder = builder.with_partition_filter({"date": "2024-01-01", "region": "us"})

# Get read type
fields = builder.read_type()
print(f"Will read fields: {[f.name for f in fields]}")

# Full example
builder = (table.new_read_builder()
    .with_projection(["user_id", "event_type", "timestamp"])
    .with_partition_filter({"date": "2024-01-01"})
    .with_limit(10000))

scan = builder.new_scan()
read = builder.new_read()

# To Arrow
splits = scan.plan().splits()
arrow_table = read.to_arrow(splits)
print(f"Read {arrow_table.num_rows} rows")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment