Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:ArroyoSystems Arroyo Validate UDF

From Leeroopedia


Template:Implementation

Summary

This page documents the implementation of UDF validation in the Arroyo streaming engine, covering the ParsedUdfFile::try_parse method for source code parsing and the validate_udf HTTP endpoint for API-driven validation.

Code Reference

Component File Lines
UDF file parsing crates/arroyo-udf/arroyo-udf-host/src/lib.rs L60-L93
UDF source parser crates/arroyo-udf/arroyo-udf-common/src/parse.rs L257-L349
Validation endpoint crates/arroyo-api/src/udfs.rs L274-L289

ParsedUdfFile::try_parse

Signature

impl ParsedUdfFile {
    pub fn try_parse(def: &str) -> anyhow::Result<ParsedUdfFile>
}

I/O

  • Input: def: &str -- the full UDF source file as a string
  • Output: ParsedUdfFile on success, or an anyhow::Error with a descriptive message on failure

Output Structure

pub struct ParsedUdfFile {
    pub udf: ParsedUdf,
    pub definition: String,
    pub dependencies: toml::Table,
}

The ParsedUdf struct contains:

  • Function name -- the identifier of the UDF
  • Parameter types -- a list of Arrow DataTypes corresponding to each function parameter
  • Return type -- the Arrow DataType of the return value
  • Async marker -- whether the function is declared as async
  • Nullability -- whether the return type is wrapped in Option

Behavior

The try_parse method performs the following steps:

  1. Extract dependency block: Scans leading comment lines for a TOML block declaring crate dependencies. Parses this into a toml::Table.
  2. Parse function item: Uses the syn crate to parse the source into an ItemFn, extracting the function signature.
  3. Map parameter types: Iterates over each parameter, converting its Rust type to the corresponding Arrow DataType. Unsupported types produce an error.
  4. Map return type: Converts the return type to an Arrow DataType, detecting Option<T> for nullability.
  5. Detect async: Checks the asyncness field of the function signature.
  6. Construct result: Assembles the ParsedUdfFile with all extracted metadata.

validate_udf Endpoint

Signature

pub async fn validate_udf(
    Json(req): Json<ValidateUdfPost>,
) -> Result<Json<UdfValidationResult>, ErrorResp>

I/O

  • Input: ValidateUdfPost containing the UDF source definition and language
  • Output: Json<UdfValidationResult> containing either validation success with parsed metadata or a list of error messages

Behavior

The validate_udf endpoint orchestrates the full validation pipeline:

  1. Parse the UDF source using ParsedUdfFile::try_parse
  2. Build (check-only) by calling build_udf() with save=false, which triggers a cargo check without producing compiled artifacts
  3. Return results: On success, returns the parsed UDF metadata (name, parameter types, return type). On failure, returns the compilation errors as a structured error response.

The save=false flag is the key distinction between validation and compilation: validation performs all the same steps except it does not persist the compiled dynamic library or register the UDF in the database.

Error Handling

Errors can originate from multiple stages:

Stage Error Type Example
TOML parsing Malformed dependency block Invalid TOML syntax in comment header
Syntax parsing Invalid Rust syntax Missing semicolon, unbalanced braces
Type mapping Unsupported type Using a custom struct as a parameter type
Cargo check Semantic error Borrow checker violation, missing import

All errors are collected and returned in the UdfValidationResult response, enabling the web UI to display them inline.

Implements

Principle:ArroyoSystems_Arroyo_UDF_Validation Environment:ArroyoSystems_Arroyo_Python_UDF_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment