Implementation:Pola rs Polars GroupBy Agg Expressions
| Knowledge Sources | |
|---|---|
| Domains | Data Engineering, DataFrame |
| Last Updated | 2026-02-09 10:00 GMT |
Overview
Concrete APIs for building and executing aggregation expressions on grouped DataFrames, including standard reductions, conditional aggregation, filtered aggregation, and sorted aggregation.
Description
The .agg() method on GroupBy and LazyGroupBy accepts one or more Polars expressions that define how each group's data should be reduced. Each expression operates on the column values within a single group and produces one output value per group. The result is a DataFrame with one row per group and one column per grouping key plus one column per aggregation expression.
Polars provides a rich set of built-in aggregate functions: pl.len() for row count, Expr.sum(), Expr.mean(), Expr.min(), Expr.max(), Expr.first(), Expr.last(), Expr.std(), Expr.var(), Expr.median(), Expr.n_unique(), and Expr.count(). Expressions can be composed with .filter(), .sort(), .sort_by(), and pl.when().then().otherwise() for conditional and filtered aggregation.
Usage
Use .agg() whenever you need to:
- Compute summary statistics per group.
- Count rows matching a condition per group.
- Extract the first or last value after sorting within each group.
- Combine multiple aggregation metrics in a single operation.
Code Reference
Source Location
- Repository: Polars
- File:
docs/source/src/python/user-guide/expressions/aggregation.py(lines 23-178)
Signature
# GroupBy aggregation
GroupBy.agg(
*exprs: IntoExpr | Iterable[IntoExpr],
) -> DataFrame
# LazyGroupBy aggregation
LazyGroupBy.agg(
*exprs: IntoExpr | Iterable[IntoExpr],
) -> LazyFrame
# Row count
pl.len() -> Expr
# Standard aggregate methods on Expr
Expr.sum() -> Expr
Expr.mean() -> Expr
Expr.min() -> Expr
Expr.max() -> Expr
Expr.first() -> Expr
Expr.last() -> Expr
Expr.std() -> Expr
Expr.var() -> Expr
Expr.median() -> Expr
Expr.n_unique() -> Expr
Expr.count() -> Expr
# Conditional expression
pl.when(
predicate: Expr,
) -> When
When.then(
value: IntoExpr,
) -> Then
Then.otherwise(
value: IntoExpr,
) -> Expr
# Filtered aggregation
Expr.filter(
predicate: Expr,
) -> Expr
Import
import polars as pl
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| *exprs | IntoExpr |
Yes | One or more aggregation expressions to evaluate per group |
| predicate (filter) | Expr |
No | Boolean expression to filter rows within each group before aggregation |
| predicate (when) | Expr |
No | Boolean condition for conditional aggregation via when/then/otherwise
|
Outputs
| Name | Type | Description |
|---|---|---|
| result | DataFrame / LazyFrame |
Aggregated DataFrame with one row per group; columns are grouping keys plus aggregation results |
Usage Examples
Basic Aggregation
import polars as pl
result = (
dataset.lazy()
.group_by("first_name")
.agg(
pl.len(),
pl.col("gender"),
pl.first("last_name"),
)
.sort("len", descending=True)
.limit(5)
.collect()
)
Conditional Aggregation
import polars as pl
result = (
dataset.lazy()
.group_by("state")
.agg(
(pl.col("party") == "Anti-Administration").sum().alias("anti"),
(pl.col("party") == "Pro-Administration").sum().alias("pro"),
)
.sort("pro", descending=True)
.limit(5)
.collect()
)
Filtered Aggregation with Helper Function
import polars as pl
from datetime import date
def compute_age() -> pl.Expr:
return date.today().year - pl.col("birthday").dt.year()
def avg_age(gender: str) -> pl.Expr:
return (
compute_age()
.filter(pl.col("gender") == gender)
.mean()
.alias(f"avg {gender} age")
)
result = (
dataset.lazy()
.group_by("state")
.agg(
avg_age("M"),
avg_age("F"),
)
.sort("state")
.limit(5)
.collect()
)
Sorted Aggregation
import polars as pl
result = (
dataset.lazy()
.group_by("state")
.agg(
pl.col("party").sort_by("birthday").first().alias("earliest_party"),
pl.col("party").sort_by("birthday").last().alias("latest_party"),
)
.collect()
)