Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:ArroyoSystems Arroyo UDF Lifecycle Management

From Leeroopedia


Template:Principle

Summary

This principle covers managing the full lifecycle of UDFs in the Arroyo streaming engine: listing, updating, and deleting UDF definitions. It also addresses monitoring UDF execution through operator-level metrics, as UDFs do not have dedicated metric instrumentation but are observable indirectly through the operators that invoke them.

Core Concept

UDF lifecycle management provides CRUD operations (Create, Read, Update, Delete) for UDF artifacts stored in the system. These operations manage the database records, compiled artifacts in object storage, and the relationships between UDFs and the pipelines that reference them.

Theoretical Basis

Dependency Tracking

A critical aspect of UDF lifecycle management is preventing deletion of UDFs used by active pipelines. Before deleting a UDF, the system must verify that no running or configured pipeline references the function. This dependency tracking prevents runtime failures that would occur if a pipeline attempted to load a deleted UDF's dynamic library.

Versioning

Updating a UDF definition triggers recompilation of the UDF source. Since UDF artifacts are content-addressed, an updated definition produces a new artifact at a different storage path. Pipelines referencing the UDF will pick up the new version on their next restart or recompilation, while currently running pipelines continue using the previously compiled version until restarted.

Observability

There are no UDF-specific metrics in the Arroyo metrics framework. Instead, UDF performance is monitored indirectly through the operator metrics of the operators that invoke UDFs. The operator-level metrics provide visibility into throughput, latency, and error rates that encompass UDF execution time.

CRUD Operations

Operation Description Key Consideration
Create Compile and register a new UDF Validates and compiles before persisting
Read (List) Retrieve all registered UDFs Returns metadata including dylib URLs
Update Modify a UDF's definition Triggers recompilation; old artifact remains until cleanup
Delete Remove a UDF registration Must verify no active pipeline dependencies

Operator Metrics for UDF Observability

Since UDFs execute within operators (either as part of a DataFusion plan for sync UDFs or as dedicated async UDF operators), the following operator-level Prometheus counters provide indirect UDF observability:

Metric Description
MESSAGES_RECV Number of messages received by the operator
MESSAGES_SENT Number of messages sent by the operator
BYTES_RECV Total bytes received by the operator
BYTES_SENT Total bytes sent by the operator
BATCHES_RECV Number of Arrow batches received
BATCHES_SENT Number of Arrow batches sent
DESERIALIZATION_ERRORS Number of deserialization errors encountered

All metrics are labeled with operator labels, enabling per-operator drill-down to identify UDF-related performance issues.

Design Considerations

  • Atomic updates: UDF updates should be atomic -- the new definition is compiled and the artifact uploaded before the database record is updated, preventing a window where the record points to a non-existent artifact.
  • Garbage collection: Old UDF artifacts in object storage may need periodic cleanup, as content-addressed storage accumulates stale versions over time.
  • Access control: UDF CRUD operations are gated by bearer token authentication, ensuring only authorized users can modify UDF definitions.
  • Pipeline impact: Updating or deleting a UDF does not immediately affect running pipelines. Changes take effect when pipelines are restarted or recompiled.

Related Implementation

Implementation:ArroyoSystems_Arroyo_UDF_CRUD

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment