Principle:Eventual Inc Daft Row Wise UDF
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, User_Defined_Functions |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Technique for applying custom Python functions to individual rows of a distributed DataFrame.
Description
Row-wise UDFs process one row at a time, receiving individual Python values and returning a single value. They support sync, async, and generator variants. The return type can be inferred from type hints or specified explicitly via the return_dtype parameter. When decorated with @daft.func, a Python function becomes a Daft-aware function that accepts both regular Python values and Daft Expressions. When any argument is an Expression, the function returns an Expression suitable for use in DataFrame operations such as select, with_column, and filter.
Usage
Use row-wise UDFs when you need to apply custom per-row logic that cannot be expressed with built-in expressions. This includes calling external APIs per row, performing complex conditional logic, or running Python-native computations that do not benefit from vectorization.
Theoretical Basis
Row-wise UDFs implement a scalar map operation applying a function f(row) -> value across all rows in a partition. Each row is independently processed, making this pattern embarrassingly parallel across partitions. The function is applied lazily as part of Daft's query plan and executed at materialization time.
for each partition P in DataFrame:
for each row R in P:
output[R] = f(R.col1, R.col2, ...)
Async variants enable concurrent I/O-bound operations within a partition, while generator variants allow one-to-many row expansion.