Implementation:Apache Paimon ReadBuilder Scan
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Table_Format |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tools for configuring and executing scan plans on Paimon tables with predicate pushdown and column projection.
Description
ReadBuilder provides with_filter(), with_projection(), with_limit(), and new_scan() methods for configuring reads. PredicateBuilder creates typed predicates (equal, greater_than, less_than, between, is_in, etc.) that are pushed down to the scan layer for partition and file pruning. TableScan.plan() executes the scan plan and returns a Plan containing List[Split] for parallel reading. The ReadBuilder uses a fluent interface pattern where each configuration method returns the builder itself, enabling method chaining.
Usage
Use these tools to configure and execute read operations on Paimon tables. Start by obtaining a ReadBuilder from the table, configure filters and projections as needed, then create a scan and call plan() to generate splits. The resulting splits are passed to a TableRead for data materialization.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/read/read_builder.py
- Lines: L30-86
- File: paimon-python/pypaimon/read/table_scan.py
- Lines: L33-125
- File: paimon-python/pypaimon/common/predicate_builder.py
- Lines: L25-131
Signature
class ReadBuilder:
def with_filter(self, predicate: Predicate) -> 'ReadBuilder':
def with_projection(self, projection: List[str]) -> 'ReadBuilder':
def with_limit(self, limit: int) -> 'ReadBuilder':
def new_scan(self) -> TableScan:
def new_read(self) -> TableRead:
def new_predicate_builder(self) -> PredicateBuilder:
class PredicateBuilder:
def __init__(self, row_field: List[DataField]):
def equal(self, field: str, literal: Any) -> Predicate:
def not_equal(self, field: str, literal: Any) -> Predicate:
def less_than(self, field: str, literal: Any) -> Predicate:
def greater_than(self, field: str, literal: Any) -> Predicate:
def between(self, field: str, included_lower_bound: Any,
included_upper_bound: Any) -> Predicate:
def is_in(self, field: str, literals: List[Any]) -> Predicate:
@staticmethod
def and_predicates(predicates: List[Predicate]) -> Optional[Predicate]:
@staticmethod
def or_predicates(predicates: List[Predicate]) -> Optional[Predicate]:
class TableScan:
def plan(self) -> Plan:
Import
from pypaimon.read.read_builder import ReadBuilder
from pypaimon.read.table_scan import TableScan
from pypaimon.common.predicate_builder import PredicateBuilder
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| predicate | Predicate | No | Filter predicate constructed via PredicateBuilder methods
|
| projection | List[str] | No | List of column names to include in the read |
| limit | int | No | Maximum number of rows to return |
| field | str | Yes (for predicate methods) | Column name to filter on |
| literal | Any | Yes (for predicate methods) | Value to compare against |
| literals | List[Any] | Yes (for is_in) | List of values for set membership test |
| included_lower_bound | Any | Yes (for between) | Lower bound (inclusive) for range filter |
| included_upper_bound | Any | Yes (for between) | Upper bound (inclusive) for range filter |
Outputs
| Name | Type | Description |
|---|---|---|
| with_filter return | ReadBuilder | The same ReadBuilder instance for method chaining
|
| with_projection return | ReadBuilder | The same ReadBuilder instance for method chaining
|
| with_limit return | ReadBuilder | The same ReadBuilder instance for method chaining
|
| new_scan return | TableScan | A configured TableScan ready for planning
|
| new_read return | TableRead | A configured TableRead ready for data retrieval
|
| new_predicate_builder return | PredicateBuilder | A PredicateBuilder initialized with the table's field types
|
| plan return | Plan | A Plan containing splits: List[Split] for parallel reading
|
Usage Examples
Basic Usage
# Build read with filter and projection
read_builder = table.new_read_builder()
pb = read_builder.new_predicate_builder()
predicate = pb.and_predicates([
pb.equal('name', 'Alice'),
pb.greater_than('value', 10.0),
])
read_builder = read_builder.with_filter(predicate)
read_builder = read_builder.with_projection(['id', 'name', 'value'])
# Plan the scan
scan = read_builder.new_scan()
plan = scan.plan()
splits = plan.splits()