Principle:Pola rs Polars Horizontal Fold Operations
| Knowledge Sources | |
|---|---|
| Domains | Data Engineering, DataFrame |
| Last Updated | 2026-02-09 10:00 GMT |
Overview
Applying reduction operations across columns (horizontally) rather than across rows (vertically), combining multiple column values into a single result per row.
Description
Most DataFrame operations work vertically: they reduce, transform, or filter values within a single column across multiple rows. Horizontal operations are the dual: they combine values across multiple columns within a single row.
Polars provides three primary mechanisms for horizontal operations:
pl.fold(acc, function, exprs)-- A general-purpose horizontal reduction that applies a binary function cumulatively across a set of columns, starting from an accumulator value. This is the most flexible approach and can implement any associative binary operation.pl.sum_horizontal(*exprs)-- A specialized shorthand for horizontal summation. More performant thanfoldfor the common case of adding column values.pl.concat_str(exprs, separator)-- Concatenates string representations of multiple columns into a single string column, with an optional separator.
Horizontal folds are particularly useful for:
- Row-wise sums or products across multiple numeric columns.
- Row-wise boolean conjunction/disjunction for filtering rows where all (or any) of several conditions hold.
- String assembly combining values from multiple columns into a single formatted string.
The fold function works by iterating over the specified columns left to right, applying the binary function to the running accumulator and each column in turn:
result = acc
for each column c in exprs:
result = function(result, c)
Usage
Use this pattern whenever you need to:
- Sum, multiply, or otherwise combine values across multiple columns for each row.
- Filter rows where all columns satisfy a condition (horizontal AND).
- Concatenate multiple column values into a single string.
- Apply a custom binary function across an arbitrary set of columns.
Theoretical Basis
Horizontal fold operations correspond to the fold (or reduce) operation from functional programming, applied across the column axis rather than the row axis:
Vertical aggregation (standard):
fold_vertical(f, acc, column) = f(f(f(acc, row_0), row_1), ..., row_n)
Example: column.sum() = fold_vertical(+, 0, column)
Horizontal aggregation (fold):
fold_horizontal(f, acc, row) = f(f(f(acc, col_0), col_1), ..., col_m)
Example: pl.fold(pl.lit(0), operator.add, [col_a, col_b]) = row[a] + row[b]
The two operations are duals in the algebraic sense:
| Property | Vertical (column) | Horizontal (fold) |
|---|---|---|
| Axis | Rows within a column | Columns within a row |
| Input shape | N rows, 1 column | 1 row, M columns |
| Output shape | 1 scalar (or 1 row per group) | N scalars (one per row) |
| Parallelism | Vectorized over rows | Vectorized over columns per row |
| Polars API | Expr.sum(), Expr.mean(), etc. |
pl.fold(), pl.sum_horizontal()
|
Conditional horizontal filtering is a powerful pattern that combines pl.fold with boolean logic. By using acc=pl.lit(True) and function=lambda acc, x: acc & x, one can create a row-level filter that requires all specified columns to satisfy a condition:
filter(
pl.fold(acc=True, f=AND, exprs=[col_1 > threshold, col_2 > threshold, ...])
)
This is equivalent to WHERE col_1 > threshold AND col_2 > threshold AND ... in SQL, but expressed generically over an arbitrary set of columns selected by expression.
The pl.sum_horizontal() function is an optimized special case of pl.fold(pl.lit(0), operator.add, exprs) that avoids the overhead of repeated function dispatch by using a vectorized multi-column addition kernel.