Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:ArroyoSystems Arroyo Compile UDF

From Leeroopedia


Template:Implementation

Summary

This page documents the implementation of UDF compilation in the Arroyo streaming engine, covering the CompileService::build_udf gRPC method for compiling UDF source into dynamic libraries and the create_udf HTTP endpoint for persisting UDF metadata.

Code Reference

Component File Lines
Compilation service crates/arroyo-compiler-service/src/lib.rs L252-L377
UDF creation endpoint crates/arroyo-api/src/udfs.rs L60-L117

CompileService::build_udf

Signature

impl CompileService {
    async fn build_udf(
        &self,
        request: Request<BuildUdfReq>,
    ) -> Result<Response<BuildUdfResp>, Status>
}

I/O

  • Input: BuildUdfReq containing:
    • udf_crate: UdfCrate -- the UDF crate specification with:
      • name -- the function name
      • definition -- the full source code
      • dependencies -- extracted TOML dependency table
    • save: bool -- whether to persist the compiled artifact (false for validation-only)
  • Output: BuildUdfResp containing:
    • errors: Vec<String> -- compilation errors, if any
    • udf_path: Option<String> -- the storage path of the compiled dynamic library (present only when save=true and compilation succeeds)

Behavior

The build_udf method executes the following steps:

  1. Acquire compilation mutex: Ensures only one compilation runs at a time to prevent resource contention.
  2. Generate crate structure: Creates a temporary directory containing:
    • Cargo.toml with the UDF name, arroyo-udf-macros dependency, and any user-declared dependencies
    • src/lib.rs with the UDF source code and #[udf] macro annotation
    • Crate type set to cdylib for dynamic library output
  3. Run cargo build: Executes cargo build --release in the temporary crate directory.
  4. Check for errors: Parses the cargo output for compilation errors. If errors are found, returns them in the errors field.
  5. Upload artifact (if save=true): Computes a content-addressed path from the definition hash and uploads the .so/.dylib to object storage.
  6. Return response: Returns the BuildUdfResp with either errors or the artifact path.

When save=false, steps 5 and 6 are skipped -- this is used by the validation endpoint to perform a check-only compilation.

create_udf Endpoint

Signature

pub async fn create_udf(
    State(state): State<AppState>,
    bearer_auth: BearerAuth,
    Json(req): Json<UdfPost>,
) -> Result<Json<GlobalUdf>, ErrorResp>

I/O

  • Input: UdfPost containing:
    • definition -- the full UDF source code
    • language -- the UDF language (Rust or Python)
    • prefix -- optional SQL function name prefix
    • description -- optional human-readable description
  • Output: Json<GlobalUdf> containing:
    • id -- unique identifier for the UDF record
    • name -- the function name (extracted from the definition)
    • definition -- the full source code
    • dylib_url -- the object storage URL of the compiled dynamic library
    • language -- the UDF language

Behavior

The create_udf endpoint orchestrates the full UDF creation workflow:

  1. Parse the UDF source using ParsedUdfFile::try_parse
  2. Compile the UDF by calling build_udf() with save=true
  3. Persist metadata to the database, creating a GlobalUdf record that links the UDF name to its compiled artifact
  4. Return the GlobalUdf record to the caller

If parsing or compilation fails, the endpoint returns an ErrorResp with descriptive error messages.

Content-Addressed Caching

The compilation service uses content-addressed storage paths to enable caching:

udfs/{hash_of_definition}/{udf_name}.so

If an artifact already exists at the computed path, the compilation step is skipped and the existing artifact URL is returned. This significantly reduces latency for repeated submissions of the same UDF definition.

Implements

Principle:ArroyoSystems_Arroyo_UDF_Compilation Environment:ArroyoSystems_Arroyo_Rust_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment