Workflow:Risingwavelabs Risingwave Sink Connector Pipeline
| Knowledge Sources | |
|---|---|
| Domains | Stream_Processing, Data_Engineering, Connectors |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
End-to-end process for transforming streaming data in RisingWave and delivering results to external downstream systems through sink connectors including databases, search engines, message queues, and data warehouses.
Description
This workflow describes the standard pattern for building streaming data delivery pipelines where RisingWave acts as the transformation hub. Data is ingested from streaming sources, transformed via SQL materialized views, and then delivered to one or more downstream systems through RisingWave's extensive sink connector library. Supported sinks include JDBC databases (PostgreSQL, MySQL, SQL Server, Snowflake), Elasticsearch/OpenSearch, Cassandra, Redis, Kafka, NATS, MQTT, ClickHouse, StarRocks, Doris, DynamoDB, BigQuery, and Apache Pinot.
The process covers:
- Goal: Transformed streaming data continuously delivered to external systems in near real-time.
- Scope: From source ingestion through transformation to delivery at one or more sinks.
- Strategy: Uses RisingWave's native Rust sinks and Java connector node sinks (via JNI bridge) with exactly-once or at-least-once delivery guarantees depending on the connector.
Usage
Execute this workflow when you need to deliver transformed streaming data to external systems such as databases, search engines, data warehouses, or message queues. This is the standard pattern for real-time data enrichment and distribution pipelines, operational analytics feeds, and search index maintenance.
Execution Steps
Step 1: Deploy RisingWave and Target System
Start RisingWave alongside the target downstream system. For integration testing, Docker Compose configurations are provided for each supported sink. Ensure network connectivity between RisingWave and the target system.
Key considerations:
- JDBC sinks (PostgreSQL, MySQL, Snowflake, SQL Server) require the Java connector node
- Elasticsearch, Cassandra, and other Java-based sinks also use the connector node via JNI
- Native Rust sinks (Kafka, Redis, S3) connect directly without the Java layer
Step 2: Prepare Target System
Create the necessary schemas, tables, indexes, or topics in the target system that will receive the data from RisingWave. The target schema must be compatible with the data types produced by the materialized view.
Key considerations:
- Target table primary keys must match if using upsert mode
- Some sinks require specific preparation (Snowflake needs S3 staging, BigQuery needs datasets, Cassandra needs keyspaces)
- Elasticsearch requires index mappings compatible with the data schema
Step 3: Create Streaming Source
Define a source connector in RisingWave to ingest data from your message broker or data generator. This provides the raw input data that will be transformed before delivery.
Key considerations:
- Match schema and format to incoming data
- Multiple sources can feed into the same transformation pipeline
- Built-in datagen source is available for testing without external dependencies
Step 4: Define Materialized Views
Create materialized views that transform, aggregate, filter, or enrich the source data. These views produce the output that will be delivered to the sink.
Key considerations:
- Transformation logic should produce output compatible with the target system schema
- Consider the sink's capabilities when designing transformations (append-only vs. upsert)
- Use multiple views for staged transformations
Step 5: Create Sink Connector
Define a sink that connects a materialized view or table to the target system. Specify the connector type, connection parameters, target table/topic, and write mode (append-only or upsert).
Key considerations:
- Choose the appropriate connector type matching your target system
- Configure authentication credentials and connection parameters
- Select append-only for immutable event logs or upsert for maintaining synchronized replicas
- The sink factory validates configuration before starting data delivery
Step 6: Verify Data Delivery
Query the target system to verify that data has been delivered correctly. Check row counts, data freshness, and content accuracy against expected results.
Key considerations:
- Each integration test includes a verification script (sink_check.py) as a reference
- Monitor connector metrics for throughput and error rates
- Check RisingWave logs and connector node logs for delivery errors