Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Eventual Inc Daft DataFrame Groupby

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Data_Analysis
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for performing grouped aggregations on a DataFrame provided by the Daft library.

Description

The groupby method on a Daft DataFrame creates a GroupedDataFrame by partitioning data by one or more key columns or expressions. The resulting GroupedDataFrame supports aggregation methods such as agg(), sum(), mean(), count(), min(), max(), list(), and any_value(). Group-by columns can be specified as strings (column names), Expression objects, or combinations thereof. Wildcard inputs are expanded before grouping.

Usage

Use this method on a DataFrame when you need to compute grouped summaries. Chain it with .agg() or convenience aggregation methods to specify the desired aggregate computations.

Code Reference

Source Location

  • Repository: Daft
  • File: daft/dataframe/dataframe.py
  • Lines: L3807-3847

Signature

def groupby(self, *group_by: ManyColumnsInputType) -> "GroupedDataFrame"

Import

# Method on DataFrame, no separate import needed
df.groupby("col").agg(col("x").sum())

I/O Contract

Inputs

Name Type Required Description
*group_by ManyColumnsInputType Yes One or more columns to group by; can be column name strings, Expression objects, or iterables of these

Outputs

Name Type Description
return GroupedDataFrame A grouped DataFrame supporting aggregation methods (sum, mean, count, min, max, agg, list, any_value, etc.)

Usage Examples

Basic Usage

import daft
from daft import col

df = daft.from_pydict({
    "pet": ["cat", "dog", "dog", "cat"],
    "age": [1, 2, 3, 4],
    "name": ["Alex", "Jordan", "Sam", "Riley"],
})

# Group by "pet" and compute multiple aggregations
grouped_df = df.groupby("pet").agg(
    df["age"].min().alias("min_age"),
    df["age"].max().alias("max_age"),
    df["pet"].count().alias("count"),
    df["name"].any_value(),
)
grouped_df = grouped_df.sort("pet")
grouped_df.show()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment