Implementation:Apache Paimon FormatReadBuilder

Knowledge Sources	Apache_Paimon
Domains	Format Tables, Query Building
Last Updated	2026-02-08 00:00 GMT

Overview

FormatReadBuilder is a builder class for constructing read operations on format tables with support for filtering, projection, and limits.

Description

The FormatReadBuilder class provides a fluent API for configuring read operations on Apache Paimon format tables. It allows users to specify projections (column selection), filters (partition pruning), and limits (row count restrictions) before creating scan and read objects.

The builder supports partition filtering through predicates or direct partition specification dictionaries. When a predicate is provided, it attempts to extract partition specifications for efficient partition pruning. The builder maintains configuration state and creates configured FormatTableScan and FormatTableRead instances.

The class also provides access to predicate builders for constructing complex filter expressions and exposes the read type (list of projected fields) for query planning purposes.

Usage

Use FormatReadBuilder when reading from format tables (Parquet, ORC, CSV, JSON, Text) with specific column projections, partition filters, or row limits to optimize query performance and reduce data transfer.

Code Reference

Source Location

Repository: Apache_Paimon
File: paimon-python/pypaimon/table/format/format_read_builder.py

Signature

class FormatReadBuilder:
    """Builder for constructing format table read operations."""

    def __init__(self, table: FormatTable):
        """Initialize with a FormatTable instance."""

    def with_filter(self, predicate: Predicate) -> "FormatReadBuilder":
        """Set partition filter from predicate."""

    def with_projection(self, projection: List[str]) -> "FormatReadBuilder":
        """Set column projection."""

    def with_limit(self, limit: int) -> "FormatReadBuilder":
        """Set row limit."""

    def with_partition_filter(self, partition_spec: Optional[dict]) -> "FormatReadBuilder":
        """Set partition filter directly."""

    def new_scan(self) -> FormatTableScan:
        """Create a new scan with current configuration."""

    def new_read(self) -> FormatTableRead:
        """Create a new read with current configuration."""

    def new_predicate_builder(self) -> PredicateBuilder:
        """Create a new predicate builder."""

    def read_type(self) -> List[DataField]:
        """Get the list of fields that will be read."""

Import

from pypaimon.table.format.format_read_builder import FormatReadBuilder

I/O Contract

Inputs

Name	Type	Required	Description
table	FormatTable	Yes	Format table to read from
predicate	Predicate	No	Filter predicate for partition pruning
projection	List[str]	No	List of column names to project
limit	int	No	Maximum number of rows to return
partition_spec	dict	No	Direct partition specification for filtering

Outputs

Name	Type	Description
builder	FormatReadBuilder	Builder with updated configuration
scan	FormatTableScan	Configured scan object
read	FormatTableRead	Configured read object
predicate_builder	PredicateBuilder	New predicate builder
fields	List[DataField]	Fields to be read after projection

Usage Examples

from pypaimon.table.format.format_read_builder import FormatReadBuilder

# Create read builder
builder = table.new_read_builder()

# Configure with projection and limit
builder = (builder
    .with_projection(["id", "name", "age"])
    .with_limit(1000))

# Create scan and read
scan = builder.new_scan()
read = builder.new_read()

# Execute query
splits = scan.plan().splits()
df = read.to_pandas(splits)

# With partition filter using predicate
predicate_builder = builder.new_predicate_builder()
predicate = predicate_builder.equal("date", "2024-01-01")
builder = builder.with_filter(predicate)

# With direct partition filter
builder = builder.with_partition_filter({"date": "2024-01-01", "region": "us"})

# Get read type
fields = builder.read_type()
print(f"Will read fields: {[f.name for f in fields]}")

# Full example
builder = (table.new_read_builder()
    .with_projection(["user_id", "event_type", "timestamp"])
    .with_partition_filter({"date": "2024-01-01"})
    .with_limit(10000))

scan = builder.new_scan()
read = builder.new_read()

# To Arrow
splits = scan.plan().splits()
arrow_table = read.to_arrow(splits)
print(f"Read {arrow_table.num_rows} rows")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment