Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pola rs Polars DataFrame Group By

From Leeroopedia


Knowledge Sources
Domains Data Engineering, DataFrame
Last Updated 2026-02-09 10:00 GMT

Overview

Concrete APIs for partitioning DataFrames and LazyFrames into groups using column names or computed expressions as grouping keys.

Description

The group_by() method is available on both DataFrame (eager) and LazyFrame (lazy). It accepts one or more grouping keys, which can be column name strings or Polars expressions. The method returns a GroupBy or LazyGroupBy object, which is an intermediate object that must be followed by an .agg() call to produce results.

When expressions are used as grouping keys, they must be aliased with .alias() to provide a column name for the group key in the output DataFrame. Multi-column grouping creates groups based on the distinct combinations across all specified key columns.

The maintain_order parameter (default False) controls whether the output preserves the order in which groups first appear in the input data. Setting it to True incurs a performance penalty due to the additional ordering step.

Usage

Use group_by() whenever you need to:

  • Partition rows into groups for subsequent aggregation.
  • Group by derived expressions (e.g., decade from birth year).
  • Create multi-level groupings across several columns.

Code Reference

Source Location

  • Repository: Polars
  • File: docs/source/src/python/user-guide/expressions/aggregation.py (lines 23-31)

Signature

# Eager DataFrame grouping
DataFrame.group_by(
    *by: str | Expr,
    maintain_order: bool = False,
) -> GroupBy

# Lazy DataFrame grouping
LazyFrame.group_by(
    *by: str | Expr,
    maintain_order: bool = False,
) -> LazyGroupBy

# Expression aliasing for computed keys
Expr.alias(
    name: str,
) -> Expr

Import

import polars as pl

I/O Contract

Inputs

Name Type Required Description
*by Expr Yes One or more column names or expressions to group by
maintain_order bool No Preserve input order of groups (default False); setting to True has a performance cost

Outputs

Name Type Description
result GroupBy / LazyGroupBy An intermediate object representing the grouped DataFrame; must be followed by .agg() to produce results

Usage Examples

Simple Column Grouping

import polars as pl

# Group by a single column
result = dataset.group_by("state").agg(pl.len())

Expression-Based Grouping

import polars as pl

# Group by a computed expression (decade from birth year)
result = dataset.group_by(
    (pl.col("birthday").dt.year() // 10 * 10).alias("decade"),
    maintain_order=True,
).agg(pl.len())

Multi-Column Grouping

import polars as pl

# Group by multiple columns
result = dataset.group_by("state", "party").agg(
    pl.len().alias("count"),
)

Lazy Grouping

import polars as pl

# Group on a LazyFrame for deferred execution
result = (
    dataset.lazy()
    .group_by("first_name")
    .agg(pl.len(), pl.col("gender"), pl.first("last_name"))
    .sort("len", descending=True)
    .limit(5)
    .collect()
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment