Implementation:Apache Paimon FormatLanceReader Predicate Pushdown
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Columnar_Storage |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for applying predicate pushdown during Lance file reading in Paimon.
Description
FormatLanceReader receives push_down_predicate from the read pipeline and converts it to a PyArrow dataset filter expression. It uses lance.file.LanceFileReader to open Lance files and applies both column selection (read_fields) and row filtering (filter expression). The PredicateBuilder provides methods like greater_than(), equal(), between() to construct predicates that are converted to Lance-compatible filters.
The predicate conversion process translates Paimon's internal Predicate representation into PyArrow compute expressions that Lance understands natively. This allows the Lance file reader to evaluate the filter during the file read operation, avoiding the overhead of reading and then discarding non-matching rows.
Usage
Use this implementation when reading from Lance-format tables with filter conditions. The predicates are constructed via PredicateBuilder and passed through ReadBuilder.with_filter() to reach the FormatLanceReader.
Code Reference
Source Location
- Repository: Apache Paimon
- File: paimon-python/pypaimon/read/reader/format_lance_reader.py:L29-73
- File: paimon-python/pypaimon/common/predicate_builder.py:L25-131
Signature
class FormatLanceReader:
def __init__(self, file_io, data_file_path: str, read_fields: List[str],
push_down_predicate: Optional[Predicate] = None):
def read_arrow_batch(self) -> Optional[pyarrow.RecordBatch]:
class PredicateBuilder:
def greater_than(self, field: str, literal: Any) -> Predicate:
def less_than(self, field: str, literal: Any) -> Predicate:
def equal(self, field: str, literal: Any) -> Predicate:
def between(self, field: str, lower: Any, upper: Any) -> Predicate:
Import
from pypaimon.common.predicate_builder import PredicateBuilder
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| field | str | Yes | Column name to apply the predicate on |
| literal | Any | Yes | Comparison value for the predicate (e.g., numeric threshold, string match) |
| lower | Any | No | Lower bound for between() predicates |
| upper | Any | No | Upper bound for between() predicates |
Outputs
| Name | Type | Description |
|---|---|---|
| result | pyarrow.RecordBatch | Filtered data as Arrow RecordBatch containing only rows matching the predicate |
Usage Examples
Basic Usage
read_builder = table.new_read_builder()
pb = read_builder.new_predicate_builder()
# Create predicate for Lance pushdown
predicate = pb.greater_than('value', 50.0)
read_builder = read_builder.with_filter(predicate)
scan = read_builder.new_scan()
splits = scan.plan().splits()
reader = read_builder.new_read()
df = reader.to_pandas(splits) # Only rows where value > 50.0