Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Eventual Inc Daft Daft Func Batch

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, User_Defined_Functions
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for decorating Python functions as batch-oriented user-defined functions provided by the Daft library.

Description

The @daft.func.batch decorator converts a Python function into a Daft user-defined batch function. Batch functions receive daft.Series arguments and return a daft.Series, list, numpy.ndarray, or pyarrow.Array. Scalar arguments are passed through without modification. Unlike row-wise UDFs, the return_dtype parameter is required. An optional batch_size parameter controls the maximum number of rows in each input batch.

Usage

Import via import daft and apply the @daft.func.batch(return_dtype=...) decorator. Use when you need vectorized columnar processing with libraries like PyArrow or NumPy.

Code Reference

Source Location

  • Repository: Daft
  • File: daft/udf/__init__.py
  • Lines: L229-312

Signature

@daft.func.batch(
    *,
    return_dtype: DataTypeLike,
    unnest: bool = False,
    use_process: bool | None = None,
    batch_size: int | None = None,
    max_retries: int | None = None,
    on_error: Literal["raise", "log", "ignore"] | None = None,
)

Import

import daft
from daft import DataType, Series

@daft.func.batch(return_dtype=DataType.int64())
def my_batch_fn(a: Series, b: Series) -> Series:
    ...

I/O Contract

Inputs

Name Type Required Description
return_dtype DataTypeLike Yes The data type that the function returns. Required for batch UDFs.
unnest bool No Whether to unnest/flatten struct return type fields into columns. Defaults to False.
use_process None No Whether to run each instance in a separate process. Daft auto-selects if unset.
batch_size None No The maximum number of rows in each input batch.
max_retries None No Maximum number of retries on failure.
on_error None No Error handling strategy.

Outputs

Name Type Description
return Func wrapper A Func wrapper for batch-oriented UDF that can be used as an Expression in DataFrame operations.

Usage Examples

Basic Usage

import daft
from daft import DataType, Series

@daft.func.batch(return_dtype=DataType.int64())
def my_sum(a: Series, b: Series) -> Series:
    import pyarrow.compute as pc
    a = a.to_arrow()
    b = b.to_arrow()
    result = pc.add(a, b)
    return result

df = daft.from_pydict({"x": [1, 2, 3], "y": [4, 5, 6]})
df.select(my_sum(df["x"], df["y"])).collect()

Mixed Series and Scalar Arguments

import daft
from daft import DataType, Series

@daft.func.batch(return_dtype=DataType.int64())
def add_scalar(a: Series, b: int) -> Series:
    import pyarrow.compute as pc
    return pc.add(a.to_arrow(), b)

df = daft.from_pydict({"x": [1, 2, 3]})
df.select(add_scalar(df["x"], 10)).collect()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment