Implementation:Eventual Inc Daft Daft Func
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, User_Defined_Functions |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for decorating Python functions as row-wise user-defined functions provided by the Daft library.
Description
The @daft.func decorator converts a Python function into a Daft user-defined function that operates row-by-row. Decorated functions accept both their original argument types and Daft Expressions. When any arguments are Expressions, they return a Daft Expression that can be used in DataFrame operations. When called without Expression arguments, they execute immediately. The decorator supports three variants: row-wise (default), async row-wise (for async functions), and generator (for generator functions producing multiple output rows per input row).
Usage
Import via import daft and apply the @daft.func decorator to any Python function. Use when you need per-row custom logic in DataFrame pipelines.
Code Reference
Source Location
- Repository: Daft
- File:
daft/udf/__init__.py - Lines: L21-227
Signature
@daft.func(
*,
return_dtype: DataTypeLike | None = None,
unnest: bool = False,
use_process: bool | None = None,
max_retries: int | None = None,
on_error: Literal["raise", "log", "ignore"] | None = None,
)
Import
import daft
@daft.func
def my_fn(a: int, b: int) -> int:
return a + b
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| return_dtype | None | No | The data type the function returns. If not specified, inferred from type hints. |
| unnest | bool | No | Whether to unnest/flatten struct return type fields into columns. Defaults to False. |
| use_process | None | No | Whether to run each instance in a separate process. Daft auto-selects if unset. |
| max_retries | None | No | Maximum number of retries on failure. |
| on_error | None | No | Error handling strategy. |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Func wrapper | A Func wrapper that can be used as an Expression in DataFrame operations such as select, with_column, and filter. |
Usage Examples
Basic Usage
import daft
@daft.func
def my_sum(a: int, b: int) -> int:
return a + b
df = daft.from_pydict({"x": [1, 2, 3], "y": [4, 5, 6]})
df.select(my_sum(df["x"], df["y"])).collect()
Async Usage
import daft
import asyncio
@daft.func
async def my_async_sum(a: int, b: int) -> int:
await asyncio.sleep(0.1)
return a + b
df = daft.from_pydict({"x": [1], "y": [2]})
df.select(my_async_sum(df["x"], df["y"])).collect()
Generator Usage
import daft
from typing import Iterator
@daft.func
def repeat(value: str, n: int) -> Iterator[str]:
for _ in range(n):
yield value
df = daft.from_pydict({"value": ["hello"], "n": [3]})
df.select(repeat(df["value"], df["n"])).collect()