Implementation:Eventual Inc Daft Daft Func Batch
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, User_Defined_Functions |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for decorating Python functions as batch-oriented user-defined functions provided by the Daft library.
Description
The @daft.func.batch decorator converts a Python function into a Daft user-defined batch function. Batch functions receive daft.Series arguments and return a daft.Series, list, numpy.ndarray, or pyarrow.Array. Scalar arguments are passed through without modification. Unlike row-wise UDFs, the return_dtype parameter is required. An optional batch_size parameter controls the maximum number of rows in each input batch.
Usage
Import via import daft and apply the @daft.func.batch(return_dtype=...) decorator. Use when you need vectorized columnar processing with libraries like PyArrow or NumPy.
Code Reference
Source Location
- Repository: Daft
- File:
daft/udf/__init__.py - Lines: L229-312
Signature
@daft.func.batch(
*,
return_dtype: DataTypeLike,
unnest: bool = False,
use_process: bool | None = None,
batch_size: int | None = None,
max_retries: int | None = None,
on_error: Literal["raise", "log", "ignore"] | None = None,
)
Import
import daft
from daft import DataType, Series
@daft.func.batch(return_dtype=DataType.int64())
def my_batch_fn(a: Series, b: Series) -> Series:
...
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| return_dtype | DataTypeLike | Yes | The data type that the function returns. Required for batch UDFs. |
| unnest | bool | No | Whether to unnest/flatten struct return type fields into columns. Defaults to False. |
| use_process | None | No | Whether to run each instance in a separate process. Daft auto-selects if unset. |
| batch_size | None | No | The maximum number of rows in each input batch. |
| max_retries | None | No | Maximum number of retries on failure. |
| on_error | None | No | Error handling strategy. |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Func wrapper | A Func wrapper for batch-oriented UDF that can be used as an Expression in DataFrame operations. |
Usage Examples
Basic Usage
import daft
from daft import DataType, Series
@daft.func.batch(return_dtype=DataType.int64())
def my_sum(a: Series, b: Series) -> Series:
import pyarrow.compute as pc
a = a.to_arrow()
b = b.to_arrow()
result = pc.add(a, b)
return result
df = daft.from_pydict({"x": [1, 2, 3], "y": [4, 5, 6]})
df.select(my_sum(df["x"], df["y"])).collect()
Mixed Series and Scalar Arguments
import daft
from daft import DataType, Series
@daft.func.batch(return_dtype=DataType.int64())
def add_scalar(a: Series, b: int) -> Series:
import pyarrow.compute as pc
return pc.add(a.to_arrow(), b)
df = daft.from_pydict({"x": [1, 2, 3]})
df.select(add_scalar(df["x"], 10)).collect()