Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon ReadBuilder Scan

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Table_Format
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tools for configuring and executing scan plans on Paimon tables with predicate pushdown and column projection.

Description

ReadBuilder provides with_filter(), with_projection(), with_limit(), and new_scan() methods for configuring reads. PredicateBuilder creates typed predicates (equal, greater_than, less_than, between, is_in, etc.) that are pushed down to the scan layer for partition and file pruning. TableScan.plan() executes the scan plan and returns a Plan containing List[Split] for parallel reading. The ReadBuilder uses a fluent interface pattern where each configuration method returns the builder itself, enabling method chaining.

Usage

Use these tools to configure and execute read operations on Paimon tables. Start by obtaining a ReadBuilder from the table, configure filters and projections as needed, then create a scan and call plan() to generate splits. The resulting splits are passed to a TableRead for data materialization.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/read/read_builder.py
  • Lines: L30-86
  • File: paimon-python/pypaimon/read/table_scan.py
  • Lines: L33-125
  • File: paimon-python/pypaimon/common/predicate_builder.py
  • Lines: L25-131

Signature

class ReadBuilder:
    def with_filter(self, predicate: Predicate) -> 'ReadBuilder':
    def with_projection(self, projection: List[str]) -> 'ReadBuilder':
    def with_limit(self, limit: int) -> 'ReadBuilder':
    def new_scan(self) -> TableScan:
    def new_read(self) -> TableRead:
    def new_predicate_builder(self) -> PredicateBuilder:

class PredicateBuilder:
    def __init__(self, row_field: List[DataField]):
    def equal(self, field: str, literal: Any) -> Predicate:
    def not_equal(self, field: str, literal: Any) -> Predicate:
    def less_than(self, field: str, literal: Any) -> Predicate:
    def greater_than(self, field: str, literal: Any) -> Predicate:
    def between(self, field: str, included_lower_bound: Any,
                included_upper_bound: Any) -> Predicate:
    def is_in(self, field: str, literals: List[Any]) -> Predicate:
    @staticmethod
    def and_predicates(predicates: List[Predicate]) -> Optional[Predicate]:
    @staticmethod
    def or_predicates(predicates: List[Predicate]) -> Optional[Predicate]:

class TableScan:
    def plan(self) -> Plan:

Import

from pypaimon.read.read_builder import ReadBuilder
from pypaimon.read.table_scan import TableScan
from pypaimon.common.predicate_builder import PredicateBuilder

I/O Contract

Inputs

Name Type Required Description
predicate Predicate No Filter predicate constructed via PredicateBuilder methods
projection List[str] No List of column names to include in the read
limit int No Maximum number of rows to return
field str Yes (for predicate methods) Column name to filter on
literal Any Yes (for predicate methods) Value to compare against
literals List[Any] Yes (for is_in) List of values for set membership test
included_lower_bound Any Yes (for between) Lower bound (inclusive) for range filter
included_upper_bound Any Yes (for between) Upper bound (inclusive) for range filter

Outputs

Name Type Description
with_filter return ReadBuilder The same ReadBuilder instance for method chaining
with_projection return ReadBuilder The same ReadBuilder instance for method chaining
with_limit return ReadBuilder The same ReadBuilder instance for method chaining
new_scan return TableScan A configured TableScan ready for planning
new_read return TableRead A configured TableRead ready for data retrieval
new_predicate_builder return PredicateBuilder A PredicateBuilder initialized with the table's field types
plan return Plan A Plan containing splits: List[Split] for parallel reading

Usage Examples

Basic Usage

# Build read with filter and projection
read_builder = table.new_read_builder()
pb = read_builder.new_predicate_builder()

predicate = pb.and_predicates([
    pb.equal('name', 'Alice'),
    pb.greater_than('value', 10.0),
])

read_builder = read_builder.with_filter(predicate)
read_builder = read_builder.with_projection(['id', 'name', 'value'])

# Plan the scan
scan = read_builder.new_scan()
plan = scan.plan()
splits = plan.splits()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment