Implementation:Pola rs Polars DataFrame Group By
| Knowledge Sources | |
|---|---|
| Domains | Data Engineering, DataFrame |
| Last Updated | 2026-02-09 10:00 GMT |
Overview
Concrete APIs for partitioning DataFrames and LazyFrames into groups using column names or computed expressions as grouping keys.
Description
The group_by() method is available on both DataFrame (eager) and LazyFrame (lazy). It accepts one or more grouping keys, which can be column name strings or Polars expressions. The method returns a GroupBy or LazyGroupBy object, which is an intermediate object that must be followed by an .agg() call to produce results.
When expressions are used as grouping keys, they must be aliased with .alias() to provide a column name for the group key in the output DataFrame. Multi-column grouping creates groups based on the distinct combinations across all specified key columns.
The maintain_order parameter (default False) controls whether the output preserves the order in which groups first appear in the input data. Setting it to True incurs a performance penalty due to the additional ordering step.
Usage
Use group_by() whenever you need to:
- Partition rows into groups for subsequent aggregation.
- Group by derived expressions (e.g., decade from birth year).
- Create multi-level groupings across several columns.
Code Reference
Source Location
- Repository: Polars
- File:
docs/source/src/python/user-guide/expressions/aggregation.py(lines 23-31)
Signature
# Eager DataFrame grouping
DataFrame.group_by(
*by: str | Expr,
maintain_order: bool = False,
) -> GroupBy
# Lazy DataFrame grouping
LazyFrame.group_by(
*by: str | Expr,
maintain_order: bool = False,
) -> LazyGroupBy
# Expression aliasing for computed keys
Expr.alias(
name: str,
) -> Expr
Import
import polars as pl
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| *by | Expr | Yes | One or more column names or expressions to group by |
| maintain_order | bool |
No | Preserve input order of groups (default False); setting to True has a performance cost
|
Outputs
| Name | Type | Description |
|---|---|---|
| result | GroupBy / LazyGroupBy |
An intermediate object representing the grouped DataFrame; must be followed by .agg() to produce results
|
Usage Examples
Simple Column Grouping
import polars as pl
# Group by a single column
result = dataset.group_by("state").agg(pl.len())
Expression-Based Grouping
import polars as pl
# Group by a computed expression (decade from birth year)
result = dataset.group_by(
(pl.col("birthday").dt.year() // 10 * 10).alias("decade"),
maintain_order=True,
).agg(pl.len())
Multi-Column Grouping
import polars as pl
# Group by multiple columns
result = dataset.group_by("state", "party").agg(
pl.len().alias("count"),
)
Lazy Grouping
import polars as pl
# Group on a LazyFrame for deferred execution
result = (
dataset.lazy()
.group_by("first_name")
.agg(pl.len(), pl.col("gender"), pl.first("last_name"))
.sort("len", descending=True)
.limit(5)
.collect()
)