Implementation:ArroyoSystems Arroyo Validate UDF
Summary
This page documents the implementation of UDF validation in the Arroyo streaming engine, covering the ParsedUdfFile::try_parse method for source code parsing and the validate_udf HTTP endpoint for API-driven validation.
Code Reference
| Component | File | Lines |
|---|---|---|
| UDF file parsing | crates/arroyo-udf/arroyo-udf-host/src/lib.rs |
L60-L93 |
| UDF source parser | crates/arroyo-udf/arroyo-udf-common/src/parse.rs |
L257-L349 |
| Validation endpoint | crates/arroyo-api/src/udfs.rs |
L274-L289 |
ParsedUdfFile::try_parse
Signature
impl ParsedUdfFile {
pub fn try_parse(def: &str) -> anyhow::Result<ParsedUdfFile>
}
I/O
- Input:
def: &str-- the full UDF source file as a string - Output:
ParsedUdfFileon success, or ananyhow::Errorwith a descriptive message on failure
Output Structure
pub struct ParsedUdfFile {
pub udf: ParsedUdf,
pub definition: String,
pub dependencies: toml::Table,
}
The ParsedUdf struct contains:
- Function name -- the identifier of the UDF
- Parameter types -- a list of Arrow DataTypes corresponding to each function parameter
- Return type -- the Arrow DataType of the return value
- Async marker -- whether the function is declared as
async - Nullability -- whether the return type is wrapped in
Option
Behavior
The try_parse method performs the following steps:
- Extract dependency block: Scans leading comment lines for a TOML block declaring crate dependencies. Parses this into a
toml::Table. - Parse function item: Uses the
syncrate to parse the source into anItemFn, extracting the function signature. - Map parameter types: Iterates over each parameter, converting its Rust type to the corresponding Arrow DataType. Unsupported types produce an error.
- Map return type: Converts the return type to an Arrow DataType, detecting
Option<T>for nullability. - Detect async: Checks the
asyncnessfield of the function signature. - Construct result: Assembles the
ParsedUdfFilewith all extracted metadata.
validate_udf Endpoint
Signature
pub async fn validate_udf(
Json(req): Json<ValidateUdfPost>,
) -> Result<Json<UdfValidationResult>, ErrorResp>
I/O
- Input:
ValidateUdfPostcontaining the UDF source definition and language - Output:
Json<UdfValidationResult>containing either validation success with parsed metadata or a list of error messages
Behavior
The validate_udf endpoint orchestrates the full validation pipeline:
- Parse the UDF source using
ParsedUdfFile::try_parse - Build (check-only) by calling
build_udf()withsave=false, which triggers acargo checkwithout producing compiled artifacts - Return results: On success, returns the parsed UDF metadata (name, parameter types, return type). On failure, returns the compilation errors as a structured error response.
The save=false flag is the key distinction between validation and compilation: validation performs all the same steps except it does not persist the compiled dynamic library or register the UDF in the database.
Error Handling
Errors can originate from multiple stages:
| Stage | Error Type | Example |
|---|---|---|
| TOML parsing | Malformed dependency block | Invalid TOML syntax in comment header |
| Syntax parsing | Invalid Rust syntax | Missing semicolon, unbalanced braces |
| Type mapping | Unsupported type | Using a custom struct as a parameter type |
| Cargo check | Semantic error | Borrow checker violation, missing import |
All errors are collected and returned in the UdfValidationResult response, enabling the web UI to display them inline.
Implements
Principle:ArroyoSystems_Arroyo_UDF_Validation Environment:ArroyoSystems_Arroyo_Python_UDF_Runtime