Principle:Risingwavelabs Risingwave CDC Change Data Capture
| Knowledge Sources | |
|---|---|
| Domains | CDC, Data_Replication, Streaming |
| Last Updated | 2026-02-09 07:00 GMT |
Overview
A data replication technique that captures row-level changes from a database transaction log and streams them as insert, update, and delete events to downstream consumers.
Description
Change Data Capture (CDC) is a method of tracking data changes in a source database by reading its transaction log rather than polling tables for differences. This provides a complete, ordered history of all data modifications with minimal impact on the source database.
RisingWave implements CDC using the Debezium embedded engine, wrapped in a JNI-accessible handler. When a user creates a CDC source table (e.g., CREATE TABLE ... WITH (connector='mysql-cdc')), the system:
- Deserializes the request into a GetEventStreamRequest protobuf message
- Creates a DbzConnectorConfig with Debezium connector properties
- Launches a DbzCdcEngineRunner that manages the Debezium engine lifecycle
- Streams change events through a CdcSourceChannel (JNI) back to the Rust engine
The CDC engine supports both initial snapshot mode (reading all existing data) and streaming mode (reading only new changes from the transaction log).
Usage
Use CDC when:
- Replicating data from MySQL, PostgreSQL, MongoDB, or SQL Server
- Building real-time data pipelines from existing databases
- Migrating data with zero downtime
- Keeping streaming materialized views synchronized with upstream databases
Theoretical Basis
CDC operates on the principle of log-based replication:
Phase 1: Initial Snapshot
- Read existing table data
- Record the log position at snapshot start
Phase 2: Streaming
- Read transaction log from recorded position
- Convert log entries to change events:
INSERT → (+, row)
UPDATE → (-, old_row), (+, new_row)
DELETE → (-, row)
- Stream events to downstream consumer
Each change event includes:
- Operation type (create, update, delete)
- Before image (for updates and deletes)
- After image (for creates and updates)
- Source metadata (table, position, timestamp)