Principle:Eventual Inc Daft Batch UDF

Knowledge Sources	Daft Daft Docs
Domains	Data_Engineering, User_Defined_Functions
Last Updated	2026-02-08 00:00 GMT

Overview

Technique for applying custom Python functions to batches of rows using columnar Series input.

Description

Batch UDFs receive daft.Series objects (columnar batches) rather than individual values, enabling efficient vectorized operations using libraries like NumPy, PyArrow, or pandas. The return type must be explicitly specified via the return_dtype parameter. Batch functions can return a daft.Series, list, numpy.ndarray, or pyarrow.Array. Scalar arguments are passed through without modification, allowing mixed Series and scalar inputs.

Usage

Use batch UDFs when you need vectorized batch processing for performance-critical custom operations. Batch processing amortizes per-row overhead and leverages columnar computation libraries for significantly higher throughput compared to row-wise UDFs.

Theoretical Basis

Batch UDFs implement a vectorized map operation applying f(Series) -> Series across partitioned columnar data. By operating on columnar batches rather than individual rows, the per-row function call overhead is amortized, and the underlying computation can exploit SIMD instructions, cache locality, and vectorized library kernels.

for each partition P in DataFrame:
    for each batch B in P (sized by batch_size):
        output_batch = f(B.col1_series, B.col2_series, ...)
        append output_batch to output

Related Pages

Implemented By

Implementation:Eventual_Inc_Daft_Daft_Func_Batch

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment