Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Paimon ReadBuilder With Projection

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Columnar_Storage
Last Updated 2026-02-07 00:00 GMT

Overview

Concrete tool for configuring column projection on Paimon table reads for Lance-format tables.

Description

ReadBuilder.with_projection() accepts a list of column names to select. The read_type() method resolves the projection against the table schema to produce the list of DataField objects for the selected columns. For Lance tables, this propagates to FormatLanceReader which reads only the specified columns from Lance files.

The projection is applied at the lowest level of the read pipeline, ensuring that the Lance file reader only reads the data for the projected columns from disk. This avoids reading unnecessary column data and reduces both I/O bandwidth and memory consumption.

Usage

Use this implementation when reading from Lance-format tables and only a subset of columns is needed. The projection is configured on the ReadBuilder before creating the scan and reader.

Code Reference

Source Location

  • Repository: Apache Paimon
  • File: paimon-python/pypaimon/read/read_builder.py:L46-85

Signature

class ReadBuilder:
    def with_projection(self, projection: List[str]) -> 'ReadBuilder':
    def read_type(self) -> List[DataField]:

Import

from pypaimon.read.read_builder import ReadBuilder

I/O Contract

Inputs

Name Type Required Description
projection List[str] Yes List of column names to select from the table schema

Outputs

Name Type Description
(with_projection) ReadBuilder Configured ReadBuilder instance with the column projection applied
(read_type) List[DataField] List of DataField objects for the projected columns, resolved against the table schema

Usage Examples

Basic Usage

read_builder = table.new_read_builder()
read_builder = read_builder.with_projection(['id', 'name', 'value'])

scan = read_builder.new_scan()
splits = scan.plan().splits()
reader = read_builder.new_read()
df = reader.to_pandas(splits)  # Only 3 columns

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment