Workflow:Risingwavelabs Risingwave CDC Data Replication
| Knowledge Sources | |
|---|---|
| Domains | Stream_Processing, Data_Replication, CDC |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
End-to-end process for capturing change data from operational databases (MySQL, PostgreSQL, MongoDB, SQL Server) and replicating it into RisingWave for real-time transformation and downstream distribution.
Description
This workflow describes how to set up Change Data Capture (CDC) pipelines using RisingWave's built-in Debezium-based CDC engine. RisingWave directly connects to source databases, captures INSERT, UPDATE, and DELETE operations from transaction logs, and materializes the changes into tables and views that stay in sync with the source. The replicated data can then be joined with other streams, transformed via materialized views, and sunk to downstream systems.
The process covers:
- Goal: A real-time replica of operational database tables within RisingWave, continuously synchronized with the source.
- Scope: From enabling CDC on the source database to querying replicated data and optionally forwarding changes downstream.
- Strategy: Uses RisingWave's integrated CDC connector (backed by Debezium) that runs within the Java connector node, communicating with the Rust core via JNI bridge.
Usage
Execute this workflow when you need to replicate data from an operational database into RisingWave for real-time analytics, event-driven processing, or data distribution. Common scenarios include building real-time dashboards from OLTP data, synchronizing data across systems, and feeding downstream analytics with minimal latency.
Execution Steps
Step 1: Prepare Source Database
Configure the source database to enable change data capture. For MySQL, enable binlog in ROW format. For PostgreSQL, set WAL level to logical and create a replication slot. For MongoDB, ensure the instance is running as a replica set.
Key considerations:
- MySQL: Verify binlog_format=ROW, binlog_row_image=FULL, and appropriate user permissions (REPLICATION SLAVE, REPLICATION CLIENT)
- PostgreSQL: Set wal_level=logical, create a publication for the target tables, and ensure the user has REPLICATION privilege
- MongoDB: Ensure replica set is initialized and the user has readAnyDatabase permission
Step 2: Deploy RisingWave with Connector Node
Start RisingWave with the Java connector node enabled. The connector node hosts the Debezium CDC engine and communicates with the Rust core via JNI. For Docker deployments, the connector node is typically included in the compose configuration.
Key considerations:
- The Java connector node must be able to reach the source database over the network
- Ensure sufficient memory for the JVM running the connector node
- The JNI bridge handles data transfer between the Java CDC engine and Rust core
Step 3: Create CDC Source Table
Define a table in RisingWave using the CREATE TABLE statement with the appropriate CDC connector configuration. Specify the source database type, connection parameters, and the specific table to capture.
Key considerations:
- Use connector type matching your source: mysql-cdc, postgres-cdc, mongodb-cdc, or sqlserver-cdc
- Define the schema matching the source table columns
- Specify the primary key to enable upsert semantics for UPDATE and DELETE operations
Step 4: Verify Initial Snapshot
After creating the CDC source, RisingWave performs an initial snapshot of the existing data in the source table. Monitor the snapshot progress and verify that the baseline data has been loaded correctly.
Key considerations:
- The initial snapshot may take time depending on table size
- Query the CDC table to verify row counts match the source
- Monitor logs for any snapshot errors or connectivity issues
Step 5: Create Transformations
Build materialized views on top of the CDC tables to perform real-time transformations. Join CDC data with other streams or tables, aggregate, filter, or enrich the data as needed for your use case.
Key considerations:
- CDC tables support INSERT, UPDATE, and DELETE operations (full change stream)
- Joins between CDC tables and streaming sources are supported
- Schema changes on the source may require updating the RisingWave table definition
Step 6: Optionally Sink to Downstream Systems
Create sinks to forward the transformed data to downstream systems such as Kafka topics (in Debezium JSON format), databases, or data lakes. This enables fan-out distribution of replicated data.
Key considerations:
- Use the Kafka sink with Debezium JSON format for CDC-compatible downstream consumers
- JDBC sinks support MySQL, PostgreSQL, and other databases
- Iceberg sinks enable long-term storage in data lake formats