Principle:ArroyoSystems Arroyo Connection Profile Management

Overview

The Connection Profile Management principle governs how Arroyo manages reusable connection configurations for external systems. A connection profile stores credentials and endpoint information (such as Kafka bootstrap servers, AWS credentials, or Redis connection strings) that can be shared across multiple connection tables. This follows the separation of concerns pattern, decoupling where to connect from what data to access.

Description

In a stream processing system that interacts with many external systems, connection configuration naturally divides into two layers:

Profile-level configuration -- Credentials, endpoints, and authentication details that are shared across multiple tables using the same external system. For example, a single Kafka cluster's bootstrap servers and SASL credentials.
Table-level configuration -- Settings specific to a particular data source or sink, such as the Kafka topic name, consumer group, or read offset.

Connection profiles implement the Template pattern for connection configuration. By separating these concerns, the system achieves several benefits:

Credential reuse -- A single set of credentials can be referenced by many connection tables, eliminating duplication and reducing the surface area for credential management errors.
Centralized credential management -- When credentials rotate (e.g., API key renewal), only the profile needs to be updated rather than every individual table that uses those credentials.
Testable configurations -- Connection profiles can be tested independently before being used in table definitions, providing early validation of connectivity and authentication.
Consistent configuration -- All tables referencing the same profile are guaranteed to use the same connection settings, preventing configuration drift.

The profile lifecycle consists of:

Creation -- A user provides a name, connector type, and configuration JSON. The system validates the configuration against the connector's schema and persists it.
Testing -- Optionally, the user can test the profile by triggering a live connection attempt to the external system.
Reference -- Connection tables reference a profile by ID, inheriting its configuration.
Deletion -- Profiles can be deleted only if no connection tables reference them (enforced by foreign key constraints).

Theoretical Basis

Connection profiles implement the Template pattern for connection configuration. The Template pattern defines a skeleton of configuration that is filled in by specific instances (connection tables). This is closely related to:

Flyweight pattern -- Shared configuration state (the profile) is separated from instance-specific state (the table), reducing memory usage and configuration redundancy.
Separation of Concerns -- Authentication/endpoint configuration is orthogonal to table-level configuration. Mixing them violates the single responsibility principle.
Don't Repeat Yourself (DRY) -- Without profiles, every Kafka table connecting to the same cluster would need to independently specify bootstrap servers, SASL mechanism, username, and password.

The testability aspect follows the Fail-Fast principle -- by validating connectivity at the profile level before any tables are created, configuration errors are caught early in the workflow rather than at pipeline execution time.

Usage

Connection profiles are used in the following workflows:

Web Console -- Users create connection profiles through the UI, providing connector-specific configuration. The UI uses connector metadata (from the Connector Registry) to render appropriate configuration forms.
REST API -- The POST /v1/connection_profiles endpoint creates profiles, and POST /v1/connection_profiles/test validates them.
SQL DDL -- In SQL CREATE TABLE statements, the connection_profile option references a previously created profile by name.
Connection Tables -- When creating a connection table, the connection_profile_id field links the table to a profile.

Example: Creating and Using a Kafka Profile

# Create a Kafka connection profile
curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  http://localhost:8000/v1/connection_profiles \
  -d '{
    "name": "production-kafka",
    "connector": "kafka",
    "config": {
      "bootstrap_servers": "kafka-broker:9092",
      "authentication": {
        "sasl_mechanism": "PLAIN",
        "username": "user",
        "password": "secret"
      }
    }
  }'

# Test the profile
curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  http://localhost:8000/v1/connection_profiles/test \
  -d '{
    "name": "production-kafka",
    "connector": "kafka",
    "config": { ... }
  }'

Example: SQL Reference

CREATE TABLE orders (
    order_id BIGINT,
    customer_id BIGINT,
    amount DOUBLE
) WITH (
    connector = 'kafka',
    connection_profile = 'production-kafka',
    topic = 'orders',
    format = 'json'
);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment