Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Eventual Inc Daft Batch UDF

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, User_Defined_Functions
Last Updated 2026-02-08 00:00 GMT

Overview

Technique for applying custom Python functions to batches of rows using columnar Series input.

Description

Batch UDFs receive daft.Series objects (columnar batches) rather than individual values, enabling efficient vectorized operations using libraries like NumPy, PyArrow, or pandas. The return type must be explicitly specified via the return_dtype parameter. Batch functions can return a daft.Series, list, numpy.ndarray, or pyarrow.Array. Scalar arguments are passed through without modification, allowing mixed Series and scalar inputs.

Usage

Use batch UDFs when you need vectorized batch processing for performance-critical custom operations. Batch processing amortizes per-row overhead and leverages columnar computation libraries for significantly higher throughput compared to row-wise UDFs.

Theoretical Basis

Batch UDFs implement a vectorized map operation applying f(Series) -> Series across partitioned columnar data. By operating on columnar batches rather than individual rows, the per-row function call overhead is amortized, and the underlying computation can exploit SIMD instructions, cache locality, and vectorized library kernels.

for each partition P in DataFrame:
    for each batch B in P (sized by batch_size):
        output_batch = f(B.col1_series, B.col2_series, ...)
        append output_batch to output

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment