Principle:ArroyoSystems Arroyo Connector Registry
Overview
The Connector Registry principle governs how Arroyo maintains a centralized catalog of all available data source and sink connectors. The registry pattern allows dynamic discovery of connector capabilities, enabling the system to present available connectors to users and instantiate them by type name at runtime. In a distributed stream processing engine, connectors are the critical interface between the processing core and external systems such as Kafka, Kinesis, Redis, and file systems.
Description
A connector registry provides a single point of truth for all available integrations in the system. Rather than hard-coding connector references throughout the codebase, the registry pattern decouples connector discovery and instantiation from the code that uses connectors. Each connector registers itself with the registry, declaring its name, supported features, configuration schema, and connection type (source, sink, or both).
The registry serves three primary purposes:
- Enumeration -- It provides a complete list of available connectors so that user interfaces (such as the Arroyo web console) can display the full set of integrations to users without requiring code changes for each new connector.
- Metadata Exposure -- Each registered connector exposes metadata describing its capabilities, including its name, icon, supported connection types (source/sink), configuration schema for connection profiles, and configuration schema for table-level options. This metadata drives dynamic form generation in the UI.
- Factory Instantiation -- The registry acts as a factory, allowing the system to look up a connector by its string type name and obtain a boxed trait object that can be used to validate configurations, test connections, and create operator instances.
This pattern is essential in extensible systems where new connectors may be added without modifying core engine code. It also enables the API layer to remain generic -- a single REST endpoint can serve metadata for all connectors regardless of their underlying implementation.
Theoretical Basis
The Registry pattern provides a centralized lookup mechanism for plugin-like components. It is a well-established software design pattern closely related to the Service Locator pattern and the Abstract Factory pattern. In the context of streaming systems, connectors represent the boundary between the processing engine and external systems, making the registry a critical architectural component.
A connector registry addresses several design concerns:
- Enumerating available integrations -- The registry maintains a complete list of all connectors that the engine supports, allowing the system to answer the question "what data sources and sinks are available?" without scanning the codebase.
- Providing metadata -- Each connector declares its supported features and configuration schema through a uniform trait interface. This enables the system to generate configuration forms, validate user input, and provide autocomplete suggestions without connector-specific logic in the API layer.
- Serving as a factory for connector instantiation -- Given a connector type name (e.g.,
"kafka"or"kinesis"), the registry returns a trait object that can be used to create connection instances, validate configurations, and construct runtime operators.
The pattern draws from the broader principle of Inversion of Control -- the core engine does not depend on specific connector implementations but rather on the abstract ErasedConnector trait. Connectors are registered at startup, and the rest of the system interacts with them through the trait interface.
In Arroyo, the registry is implemented as a function that returns a HashMap keyed by connector name, with values being boxed trait objects. This design choice (a function rather than a global mutable registry) ensures thread safety and simplicity, at the cost of reconstructing the map on each call.
Usage
The Connector Registry principle is applied in the following scenarios:
- REST API connector listing -- The
GET /v1/connectorsendpoint calls the registry to enumerate all connectors, collects their metadata, sorts them alphabetically, and returns them as aConnectorCollection. - Connector lookup by type -- When creating connection profiles, connection tables, or testing connections, the system uses
connector_for_type(name)to retrieve the specific connector implementation from the registry. - SQL CREATE TABLE parsing -- The SQL planner extracts the
connectoroption fromCREATE TABLE ... WITH (connector = 'kafka', ...)statements and uses the registry to resolve the connector implementation. - Operator construction -- During pipeline compilation, the registry provides the connector implementation needed to construct source and sink operators.
Example: Retrieving All Connectors
use arroyo_connectors::connectors;
// Get the full registry as a HashMap
let all_connectors = connectors();
// Enumerate connector names
for (name, connector) in &all_connectors {
let metadata = connector.metadata();
println!("Connector: {} ({})", metadata.name, name);
}
Example: Looking Up a Specific Connector
use arroyo_connectors::connector_for_type;
// Look up a connector by type name
if let Some(kafka_connector) = connector_for_type("kafka") {
let metadata = kafka_connector.metadata();
// Use the connector to validate config, test connections, etc.
}