Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon FormatLanceReader Predicate Pushdown

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Columnar_Storage
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for applying predicate pushdown during Lance file reading in Paimon.

Description

FormatLanceReader receives push_down_predicate from the read pipeline and converts it to a PyArrow dataset filter expression. It uses lance.file.LanceFileReader to open Lance files and applies both column selection (read_fields) and row filtering (filter expression). The PredicateBuilder provides methods like greater_than(), equal(), between() to construct predicates that are converted to Lance-compatible filters.

The predicate conversion process translates Paimon's internal Predicate representation into PyArrow compute expressions that Lance understands natively. This allows the Lance file reader to evaluate the filter during the file read operation, avoiding the overhead of reading and then discarding non-matching rows.

Usage

Use this implementation when reading from Lance-format tables with filter conditions. The predicates are constructed via PredicateBuilder and passed through ReadBuilder.with_filter() to reach the FormatLanceReader.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/read/reader/format_lance_reader.py:L29-73
  • File: paimon-python/pypaimon/common/predicate_builder.py:L25-131

Signature

class FormatLanceReader:
    def __init__(self, file_io, data_file_path: str, read_fields: List[str],
                 push_down_predicate: Optional[Predicate] = None):
    def read_arrow_batch(self) -> Optional[pyarrow.RecordBatch]:

class PredicateBuilder:
    def greater_than(self, field: str, literal: Any) -> Predicate:
    def less_than(self, field: str, literal: Any) -> Predicate:
    def equal(self, field: str, literal: Any) -> Predicate:
    def between(self, field: str, lower: Any, upper: Any) -> Predicate:

Import

from pypaimon.common.predicate_builder import PredicateBuilder

I/O Contract

Inputs

Name Type Required Description
field str Yes Column name to apply the predicate on
literal Any Yes Comparison value for the predicate (e.g., numeric threshold, string match)
lower Any No Lower bound for between() predicates
upper Any No Upper bound for between() predicates

Outputs

Name Type Description
result pyarrow.RecordBatch Filtered data as Arrow RecordBatch containing only rows matching the predicate

Usage Examples

Basic Usage

read_builder = table.new_read_builder()
pb = read_builder.new_predicate_builder()

# Create predicate for Lance pushdown
predicate = pb.greater_than('value', 50.0)
read_builder = read_builder.with_filter(predicate)

scan = read_builder.new_scan()
splits = scan.plan().splits()
reader = read_builder.new_read()
df = reader.to_pandas(splits)  # Only rows where value > 50.0

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment