Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:ArroyoSystems Arroyo Connection Testing

From Leeroopedia


Template:Principle

Overview

The Connection Testing principle governs how Arroyo validates connections to external systems before they are used in production pipelines. Connection testing verifies that credentials, endpoints, schemas, and configurations are correct by attempting a live connection to the external system and streaming real-time test results back to the user via Server-Sent Events (SSE).

Description

In a distributed stream processing system, pipeline failures caused by misconfigured connections are costly: they consume cluster resources during startup, produce confusing error messages, and may require pipeline restarts to diagnose. Connection testing addresses this by providing a pre-flight validation mechanism that catches configuration errors before pipeline execution.

The connection testing workflow validates multiple aspects of a connection:

  • Credential validity -- Verifies that authentication credentials (API keys, SASL credentials, AWS IAM roles) are accepted by the external system.
  • Endpoint reachability -- Confirms that the specified endpoints (broker addresses, HTTP URLs, Redis hosts) are reachable from the Arroyo cluster.
  • Schema compatibility -- Validates that the declared schema matches the actual data format in the external system (e.g., that a Kafka topic's Schema Registry schema is compatible with the configured format).
  • Data availability -- For sources, attempts to read a small sample of data to confirm the topic/stream/table contains accessible data.

Streaming Results via SSE

Connection testing uses Server-Sent Events (SSE) to stream results in real-time rather than returning a single response after all checks complete. This design choice provides several benefits:

  • Progressive feedback -- Multi-step validation processes (connect, authenticate, read sample data, validate schema) report results as each step completes, giving users immediate visibility into which steps succeed and which fail.
  • Timeout tolerance -- Long-running validation steps (e.g., waiting for data on an empty topic) can send intermediate status updates without triggering HTTP request timeouts.
  • Partial results -- If a connection test fails at step 3 of 5, the user sees the results of steps 1 and 2, which aids in diagnosing the issue.
  • Sample data preview -- For source connectors, the test can stream a small number of deserialized records, allowing users to verify that the data looks correct before creating a pipeline.

Test Levels

Connection testing operates at two levels:

Level Endpoint Scope
Profile Test POST /v1/connection_profiles/test Tests only the connection-level configuration (endpoint, credentials). Returns a single TestSourceMessage.
Table Test POST /v1/connection_tables/test Tests the full configuration stack (profile + table options + schema). Returns an SSE stream of TestSourceMessage items including sample data.

Profile-level testing is faster and focuses on connectivity, while table-level testing is more comprehensive and includes data validation.

Theoretical Basis

Connection testing follows the Fail-Fast principle -- a system design approach that prioritizes early detection and reporting of errors rather than attempting to proceed with potentially invalid state. The benefits of fail-fast in this context are:

  • Resource conservation -- A failed connection test consumes minimal resources (a single HTTP request and a brief connection attempt) compared to a failed pipeline deployment that may allocate workers, create checkpoints, and consume operator slots.
  • Clear error attribution -- When a connection test fails, the error is unambiguously attributable to the connection configuration. When a pipeline fails at runtime, the error could be caused by connection issues, query logic errors, data quality problems, or resource constraints.
  • User experience -- Immediate feedback during configuration reduces the feedback loop from minutes (pipeline deploy, wait for failure, read logs) to seconds (test connection, see result).

The use of Server-Sent Events for streaming test results follows the Observer pattern -- the client subscribes to a stream of events and reacts to each one as it arrives. SSE is chosen over WebSocket because:

  • The communication is unidirectional (server to client only)
  • SSE has built-in reconnection semantics
  • SSE works over standard HTTP, simplifying proxy and load balancer configuration
  • The event format is simple and well-suited for structured status messages

The architecture uses a tokio mpsc channel internally to decouple the connector's test execution (which may run blocking I/O) from the HTTP response stream. The connector writes TestSourceMessage items to the channel sender, and the SSE response reads from the channel receiver.

Usage

Connection testing is invoked through the REST API or the Arroyo web console:

Testing a Connection Profile

# Test a Kafka connection profile
curl -X POST -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  http://localhost:8000/v1/connection_profiles/test \
  -d '{
    "name": "my-kafka",
    "connector": "kafka",
    "config": {
      "bootstrap_servers": "broker:9092"
    }
  }'

Testing a Connection Table

# Test a Kafka source table (returns SSE stream)
curl -N -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  http://localhost:8000/v1/connection_tables/test \
  -d '{
    "name": "orders",
    "connector": "kafka",
    "connection_profile_id": "cp_abc123",
    "config": {"topic": "orders", "type": "source"},
    "schema": {
      "format": {"json": {}},
      "fields": []
    }
  }'

Example SSE output:

data: {"status":"info","message":"Connected to Kafka broker"}

data: {"status":"info","message":"Found topic 'orders' with 3 partitions"}

data: {"status":"info","message":"Reading sample data..."}

data: {"status":"data","message":"{\"order_id\":1,\"amount\":42.50}"}

data: {"status":"done","message":"Connection test completed successfully"}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment