Implementation:Apache Paimon ReadBuilder With Projection

Knowledge Sources	Apache Paimon
Domains	Data_Lake, Columnar_Storage
Last Updated	2026-02-07 00:00 GMT

Overview

Concrete tool for configuring column projection on Paimon table reads for Lance-format tables.

Description

ReadBuilder.with_projection() accepts a list of column names to select. The read_type() method resolves the projection against the table schema to produce the list of DataField objects for the selected columns. For Lance tables, this propagates to FormatLanceReader which reads only the specified columns from Lance files.

The projection is applied at the lowest level of the read pipeline, ensuring that the Lance file reader only reads the data for the projected columns from disk. This avoids reading unnecessary column data and reduces both I/O bandwidth and memory consumption.

Usage

Use this implementation when reading from Lance-format tables and only a subset of columns is needed. The projection is configured on the ReadBuilder before creating the scan and reader.

Code Reference

Source Location

Repository: Apache Paimon
File: paimon-python/pypaimon/read/read_builder.py:L46-85

Signature

class ReadBuilder:
    def with_projection(self, projection: List[str]) -> 'ReadBuilder':
    def read_type(self) -> List[DataField]:

Import

from pypaimon.read.read_builder import ReadBuilder

I/O Contract

Inputs

Name	Type	Required	Description
projection	List[str]	Yes	List of column names to select from the table schema

Outputs

Name	Type	Description
(with_projection)	ReadBuilder	Configured ReadBuilder instance with the column projection applied
(read_type)	List[DataField]	List of DataField objects for the projected columns, resolved against the table schema

Usage Examples

Basic Usage

read_builder = table.new_read_builder()
read_builder = read_builder.with_projection(['id', 'name', 'value'])

scan = read_builder.new_scan()
splits = scan.plan().splits()
reader = read_builder.new_read()
df = reader.to_pandas(splits)  # Only 3 columns

Related Pages

Implements Principle

Principle:Apache_Paimon_Lance_Column_Projection

Requires Environment

Environment:Apache_Paimon_Python_Core_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment