Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pola rs Polars GroupBy Agg Expressions

From Leeroopedia


Knowledge Sources
Domains Data Engineering, DataFrame
Last Updated 2026-02-09 10:00 GMT

Overview

Concrete APIs for building and executing aggregation expressions on grouped DataFrames, including standard reductions, conditional aggregation, filtered aggregation, and sorted aggregation.

Description

The .agg() method on GroupBy and LazyGroupBy accepts one or more Polars expressions that define how each group's data should be reduced. Each expression operates on the column values within a single group and produces one output value per group. The result is a DataFrame with one row per group and one column per grouping key plus one column per aggregation expression.

Polars provides a rich set of built-in aggregate functions: pl.len() for row count, Expr.sum(), Expr.mean(), Expr.min(), Expr.max(), Expr.first(), Expr.last(), Expr.std(), Expr.var(), Expr.median(), Expr.n_unique(), and Expr.count(). Expressions can be composed with .filter(), .sort(), .sort_by(), and pl.when().then().otherwise() for conditional and filtered aggregation.

Usage

Use .agg() whenever you need to:

  • Compute summary statistics per group.
  • Count rows matching a condition per group.
  • Extract the first or last value after sorting within each group.
  • Combine multiple aggregation metrics in a single operation.

Code Reference

Source Location

  • Repository: Polars
  • File: docs/source/src/python/user-guide/expressions/aggregation.py (lines 23-178)

Signature

# GroupBy aggregation
GroupBy.agg(
    *exprs: IntoExpr | Iterable[IntoExpr],
) -> DataFrame

# LazyGroupBy aggregation
LazyGroupBy.agg(
    *exprs: IntoExpr | Iterable[IntoExpr],
) -> LazyFrame

# Row count
pl.len() -> Expr

# Standard aggregate methods on Expr
Expr.sum() -> Expr
Expr.mean() -> Expr
Expr.min() -> Expr
Expr.max() -> Expr
Expr.first() -> Expr
Expr.last() -> Expr
Expr.std() -> Expr
Expr.var() -> Expr
Expr.median() -> Expr
Expr.n_unique() -> Expr
Expr.count() -> Expr

# Conditional expression
pl.when(
    predicate: Expr,
) -> When

When.then(
    value: IntoExpr,
) -> Then

Then.otherwise(
    value: IntoExpr,
) -> Expr

# Filtered aggregation
Expr.filter(
    predicate: Expr,
) -> Expr

Import

import polars as pl

I/O Contract

Inputs

Name Type Required Description
*exprs IntoExpr Yes One or more aggregation expressions to evaluate per group
predicate (filter) Expr No Boolean expression to filter rows within each group before aggregation
predicate (when) Expr No Boolean condition for conditional aggregation via when/then/otherwise

Outputs

Name Type Description
result DataFrame / LazyFrame Aggregated DataFrame with one row per group; columns are grouping keys plus aggregation results

Usage Examples

Basic Aggregation

import polars as pl

result = (
    dataset.lazy()
    .group_by("first_name")
    .agg(
        pl.len(),
        pl.col("gender"),
        pl.first("last_name"),
    )
    .sort("len", descending=True)
    .limit(5)
    .collect()
)

Conditional Aggregation

import polars as pl

result = (
    dataset.lazy()
    .group_by("state")
    .agg(
        (pl.col("party") == "Anti-Administration").sum().alias("anti"),
        (pl.col("party") == "Pro-Administration").sum().alias("pro"),
    )
    .sort("pro", descending=True)
    .limit(5)
    .collect()
)

Filtered Aggregation with Helper Function

import polars as pl
from datetime import date

def compute_age() -> pl.Expr:
    return date.today().year - pl.col("birthday").dt.year()

def avg_age(gender: str) -> pl.Expr:
    return (
        compute_age()
        .filter(pl.col("gender") == gender)
        .mean()
        .alias(f"avg {gender} age")
    )

result = (
    dataset.lazy()
    .group_by("state")
    .agg(
        avg_age("M"),
        avg_age("F"),
    )
    .sort("state")
    .limit(5)
    .collect()
)

Sorted Aggregation

import polars as pl

result = (
    dataset.lazy()
    .group_by("state")
    .agg(
        pl.col("party").sort_by("birthday").first().alias("earliest_party"),
        pl.col("party").sort_by("birthday").last().alias("latest_party"),
    )
    .collect()
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment