Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Pola rs Polars LazyFrame Expression Chaining

From Leeroopedia


Overview

This implementation covers the concrete APIs for building expression pipelines on a LazyFrame. These methods allow users to chain transformations — selecting columns, adding computed columns, filtering rows, grouping and aggregating, sorting, and joining — into a lazy query plan that is optimized before execution.

APIs

  • LazyFrame.select(*exprs) -> LazyFrame — Project specific columns/expressions (replaces schema)
  • LazyFrame.with_columns(*exprs) -> LazyFrame — Add or overwrite columns (preserves existing schema)
  • LazyFrame.filter(predicate) -> LazyFrame — Filter rows by a boolean expression
  • LazyFrame.group_by(*by).agg(*exprs) -> LazyFrame — Group by columns and aggregate
  • LazyFrame.sort(by, descending) -> LazyFrame — Sort rows by one or more columns
  • LazyFrame.join(other, on, how) -> LazyFrame — Join with another LazyFrame

Source Reference

  • File: docs/source/src/python/user-guide/concepts/expressions.py (Lines 1-106)
  • Repository: Pola_rs_Polars

I/O Contract

Direction Type Description
Input LazyFrame A LazyFrame with an existing query plan
Input Expr (one or more) Polars expressions describing transformations
Output LazyFrame A new LazyFrame with the chained operation appended to the query plan

Key Parameters

Parameter Type Description
pl.col(name) Expr Reference a column by name
pl.lit(value) Expr Create a literal value expression
Expr.alias(name) Expr Rename the output column of an expression
pl.all() Expr Select all columns
pl.len() Expr Return the number of rows as an expression
on str or list[str] Column name(s) to join on
how str Join strategy: "inner", "left", "outer", "cross", "semi", "anti"
descending bool Sort in descending order (default False)

Example Code

Select with Computed Columns

import polars as pl

q = (
    pl.scan_csv("data.csv")
    .select(
        pl.col("name"),
        pl.col("birthdate").dt.year().alias("birth_year"),
        (pl.col("weight") / (pl.col("height") ** 2)).alias("bmi"),
    )
    .filter(pl.col("birth_year") < 1990)
    .sort("bmi", descending=True)
)

With Columns (Adding Derived Columns)

import polars as pl

q = (
    pl.scan_csv("data.csv")
    .with_columns(
        (pl.col("price") * pl.col("quantity")).alias("total"),
        pl.col("category").cast(pl.Categorical).alias("category"),
    )
)

Group By and Aggregation

import polars as pl

q = (
    pl.scan_csv("data.csv")
    .group_by("category")
    .agg(
        pl.col("value").mean().alias("avg_value"),
        pl.col("value").sum().alias("total_value"),
        pl.len().alias("count"),
    )
)

Join Two LazyFrames

import polars as pl

orders = pl.scan_csv("orders.csv")
customers = pl.scan_csv("customers.csv")

q = orders.join(customers, on="customer_id", how="left")

Import

import polars as pl

Behavior Notes

  • All methods return a new LazyFrame: The original LazyFrame is not mutated. Each chained call appends a new node to the query plan DAG.
  • Expression context matters: .select() replaces the schema with only the specified columns, while .with_columns() preserves all existing columns and adds/overwrites the specified ones.
  • Parallel expression evaluation: Multiple independent expressions within a single .select() or .with_columns() call can be evaluated in parallel across CPU cores.
  • Lazy semantics: No computation occurs during chaining. All operations are deferred until .collect() is called.
  • Alias is required for ambiguous expressions: Computed expressions (e.g., arithmetic) must be given a name via .alias() to produce a valid output column name.

Related Pages

Metadata

Field Value
Source Repository Pola_rs_Polars
Source File docs/source/src/python/user-guide/concepts/expressions.py:L1-106
Domain Data Engineering, Functional Composition, Relational Algebra
Last Updated 2026-02-09 10:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment