Implementation:Pola rs Polars LazyFrame Expression Chaining
Appearance
Overview
This implementation covers the concrete APIs for building expression pipelines on a LazyFrame. These methods allow users to chain transformations — selecting columns, adding computed columns, filtering rows, grouping and aggregating, sorting, and joining — into a lazy query plan that is optimized before execution.
APIs
LazyFrame.select(*exprs) -> LazyFrame— Project specific columns/expressions (replaces schema)LazyFrame.with_columns(*exprs) -> LazyFrame— Add or overwrite columns (preserves existing schema)LazyFrame.filter(predicate) -> LazyFrame— Filter rows by a boolean expressionLazyFrame.group_by(*by).agg(*exprs) -> LazyFrame— Group by columns and aggregateLazyFrame.sort(by, descending) -> LazyFrame— Sort rows by one or more columnsLazyFrame.join(other, on, how) -> LazyFrame— Join with another LazyFrame
Source Reference
- File:
docs/source/src/python/user-guide/concepts/expressions.py(Lines 1-106) - Repository: Pola_rs_Polars
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | LazyFrame |
A LazyFrame with an existing query plan |
| Input | Expr (one or more) |
Polars expressions describing transformations |
| Output | LazyFrame |
A new LazyFrame with the chained operation appended to the query plan |
Key Parameters
| Parameter | Type | Description |
|---|---|---|
pl.col(name) |
Expr |
Reference a column by name |
pl.lit(value) |
Expr |
Create a literal value expression |
Expr.alias(name) |
Expr |
Rename the output column of an expression |
pl.all() |
Expr |
Select all columns |
pl.len() |
Expr |
Return the number of rows as an expression |
on |
str or list[str] |
Column name(s) to join on |
how |
str |
Join strategy: "inner", "left", "outer", "cross", "semi", "anti" |
descending |
bool |
Sort in descending order (default False)
|
Example Code
Select with Computed Columns
import polars as pl
q = (
pl.scan_csv("data.csv")
.select(
pl.col("name"),
pl.col("birthdate").dt.year().alias("birth_year"),
(pl.col("weight") / (pl.col("height") ** 2)).alias("bmi"),
)
.filter(pl.col("birth_year") < 1990)
.sort("bmi", descending=True)
)
With Columns (Adding Derived Columns)
import polars as pl
q = (
pl.scan_csv("data.csv")
.with_columns(
(pl.col("price") * pl.col("quantity")).alias("total"),
pl.col("category").cast(pl.Categorical).alias("category"),
)
)
Group By and Aggregation
import polars as pl
q = (
pl.scan_csv("data.csv")
.group_by("category")
.agg(
pl.col("value").mean().alias("avg_value"),
pl.col("value").sum().alias("total_value"),
pl.len().alias("count"),
)
)
Join Two LazyFrames
import polars as pl
orders = pl.scan_csv("orders.csv")
customers = pl.scan_csv("customers.csv")
q = orders.join(customers, on="customer_id", how="left")
Import
import polars as pl
Behavior Notes
- All methods return a new LazyFrame: The original LazyFrame is not mutated. Each chained call appends a new node to the query plan DAG.
- Expression context matters:
.select()replaces the schema with only the specified columns, while.with_columns()preserves all existing columns and adds/overwrites the specified ones. - Parallel expression evaluation: Multiple independent expressions within a single
.select()or.with_columns()call can be evaluated in parallel across CPU cores. - Lazy semantics: No computation occurs during chaining. All operations are deferred until
.collect()is called. - Alias is required for ambiguous expressions: Computed expressions (e.g., arithmetic) must be given a name via
.alias()to produce a valid output column name.
Related Pages
- Principle:Pola_rs_Polars_Expression_Pipeline_Building
- Implementation:Pola_rs_Polars_Scan_LazyFrame_Creation
- Implementation:Pola_rs_Polars_LazyFrame_Explain_Show_Graph
- Implementation:Pola_rs_Polars_LazyFrame_Collect
- Environment:Pola_rs_Polars_Python_Runtime_Environment
Metadata
| Field | Value |
|---|---|
| Source Repository | Pola_rs_Polars |
| Source File | docs/source/src/python/user-guide/concepts/expressions.py:L1-106
|
| Domain | Data Engineering, Functional Composition, Relational Algebra |
| Last Updated | 2026-02-09 10:00 GMT |
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment