Workflow:Risingwavelabs Risingwave CDC Data Replication

Knowledge Sources	RisingWave RisingWave Docs CDC Source Guide
Domains	Stream_Processing, Data_Replication, CDC
Last Updated	2026-02-09 12:00 GMT

Overview

End-to-end process for capturing change data from operational databases (MySQL, PostgreSQL, MongoDB, SQL Server) and replicating it into RisingWave for real-time transformation and downstream distribution.

Description

This workflow describes how to set up Change Data Capture (CDC) pipelines using RisingWave's built-in Debezium-based CDC engine. RisingWave directly connects to source databases, captures INSERT, UPDATE, and DELETE operations from transaction logs, and materializes the changes into tables and views that stay in sync with the source. The replicated data can then be joined with other streams, transformed via materialized views, and sunk to downstream systems.

The process covers:

Goal: A real-time replica of operational database tables within RisingWave, continuously synchronized with the source.
Scope: From enabling CDC on the source database to querying replicated data and optionally forwarding changes downstream.
Strategy: Uses RisingWave's integrated CDC connector (backed by Debezium) that runs within the Java connector node, communicating with the Rust core via JNI bridge.

Usage

Execute this workflow when you need to replicate data from an operational database into RisingWave for real-time analytics, event-driven processing, or data distribution. Common scenarios include building real-time dashboards from OLTP data, synchronizing data across systems, and feeding downstream analytics with minimal latency.

Execution Steps

Step 1: Prepare Source Database

Configure the source database to enable change data capture. For MySQL, enable binlog in ROW format. For PostgreSQL, set WAL level to logical and create a replication slot. For MongoDB, ensure the instance is running as a replica set.

Key considerations:

MySQL: Verify binlog_format=ROW, binlog_row_image=FULL, and appropriate user permissions (REPLICATION SLAVE, REPLICATION CLIENT)
PostgreSQL: Set wal_level=logical, create a publication for the target tables, and ensure the user has REPLICATION privilege
MongoDB: Ensure replica set is initialized and the user has readAnyDatabase permission

Step 2: Deploy RisingWave with Connector Node

Start RisingWave with the Java connector node enabled. The connector node hosts the Debezium CDC engine and communicates with the Rust core via JNI. For Docker deployments, the connector node is typically included in the compose configuration.

Key considerations:

The Java connector node must be able to reach the source database over the network
Ensure sufficient memory for the JVM running the connector node
The JNI bridge handles data transfer between the Java CDC engine and Rust core

Step 3: Create CDC Source Table

Define a table in RisingWave using the CREATE TABLE statement with the appropriate CDC connector configuration. Specify the source database type, connection parameters, and the specific table to capture.

Key considerations:

Use connector type matching your source: mysql-cdc, postgres-cdc, mongodb-cdc, or sqlserver-cdc
Define the schema matching the source table columns
Specify the primary key to enable upsert semantics for UPDATE and DELETE operations

Step 4: Verify Initial Snapshot

After creating the CDC source, RisingWave performs an initial snapshot of the existing data in the source table. Monitor the snapshot progress and verify that the baseline data has been loaded correctly.

Key considerations:

The initial snapshot may take time depending on table size
Query the CDC table to verify row counts match the source
Monitor logs for any snapshot errors or connectivity issues

Step 5: Create Transformations

Build materialized views on top of the CDC tables to perform real-time transformations. Join CDC data with other streams or tables, aggregate, filter, or enrich the data as needed for your use case.

Key considerations:

CDC tables support INSERT, UPDATE, and DELETE operations (full change stream)
Joins between CDC tables and streaming sources are supported
Schema changes on the source may require updating the RisingWave table definition

Step 6: Optionally Sink to Downstream Systems

Create sinks to forward the transformed data to downstream systems such as Kafka topics (in Debezium JSON format), databases, or data lakes. This enables fan-out distribution of replicated data.

Key considerations:

Use the Kafka sink with Debezium JSON format for CDC-compatible downstream consumers
JDBC sinks support MySQL, PostgreSQL, and other databases
Iceberg sinks enable long-term storage in data lake formats

Execution Diagram

GitHub URL

Workflow Repository