Implementation:ArroyoSystems Arroyo Compile UDF
Summary
This page documents the implementation of UDF compilation in the Arroyo streaming engine, covering the CompileService::build_udf gRPC method for compiling UDF source into dynamic libraries and the create_udf HTTP endpoint for persisting UDF metadata.
Code Reference
| Component | File | Lines |
|---|---|---|
| Compilation service | crates/arroyo-compiler-service/src/lib.rs |
L252-L377 |
| UDF creation endpoint | crates/arroyo-api/src/udfs.rs |
L60-L117 |
CompileService::build_udf
Signature
impl CompileService {
async fn build_udf(
&self,
request: Request<BuildUdfReq>,
) -> Result<Response<BuildUdfResp>, Status>
}
I/O
- Input:
BuildUdfReqcontaining:udf_crate: UdfCrate-- the UDF crate specification with:name-- the function namedefinition-- the full source codedependencies-- extracted TOML dependency table
save: bool-- whether to persist the compiled artifact (false for validation-only)
- Output:
BuildUdfRespcontaining:errors: Vec<String>-- compilation errors, if anyudf_path: Option<String>-- the storage path of the compiled dynamic library (present only whensave=trueand compilation succeeds)
Behavior
The build_udf method executes the following steps:
- Acquire compilation mutex: Ensures only one compilation runs at a time to prevent resource contention.
- Generate crate structure: Creates a temporary directory containing:
Cargo.tomlwith the UDF name,arroyo-udf-macrosdependency, and any user-declared dependenciessrc/lib.rswith the UDF source code and#[udf]macro annotation- Crate type set to
cdylibfor dynamic library output
- Run cargo build: Executes
cargo build --releasein the temporary crate directory. - Check for errors: Parses the cargo output for compilation errors. If errors are found, returns them in the
errorsfield. - Upload artifact (if save=true): Computes a content-addressed path from the definition hash and uploads the
.so/.dylibto object storage. - Return response: Returns the
BuildUdfRespwith either errors or the artifact path.
When save=false, steps 5 and 6 are skipped -- this is used by the validation endpoint to perform a check-only compilation.
create_udf Endpoint
Signature
pub async fn create_udf(
State(state): State<AppState>,
bearer_auth: BearerAuth,
Json(req): Json<UdfPost>,
) -> Result<Json<GlobalUdf>, ErrorResp>
I/O
- Input:
UdfPostcontaining:definition-- the full UDF source codelanguage-- the UDF language (Rust or Python)prefix-- optional SQL function name prefixdescription-- optional human-readable description
- Output:
Json<GlobalUdf>containing:id-- unique identifier for the UDF recordname-- the function name (extracted from the definition)definition-- the full source codedylib_url-- the object storage URL of the compiled dynamic librarylanguage-- the UDF language
Behavior
The create_udf endpoint orchestrates the full UDF creation workflow:
- Parse the UDF source using
ParsedUdfFile::try_parse - Compile the UDF by calling
build_udf()withsave=true - Persist metadata to the database, creating a
GlobalUdfrecord that links the UDF name to its compiled artifact - Return the
GlobalUdfrecord to the caller
If parsing or compilation fails, the endpoint returns an ErrorResp with descriptive error messages.
Content-Addressed Caching
The compilation service uses content-addressed storage paths to enable caching:
udfs/{hash_of_definition}/{udf_name}.so
If an artifact already exists at the computed path, the compilation step is skipped and the existing artifact URL is returned. This significantly reduces latency for repeated submissions of the same UDF definition.
Implements
Principle:ArroyoSystems_Arroyo_UDF_Compilation Environment:ArroyoSystems_Arroyo_Rust_Runtime