Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Risingwavelabs Risingwave CDC Change Data Capture

From Leeroopedia


Knowledge Sources
Domains CDC, Data_Replication, Streaming
Last Updated 2026-02-09 07:00 GMT

Overview

A data replication technique that captures row-level changes from a database transaction log and streams them as insert, update, and delete events to downstream consumers.

Description

Change Data Capture (CDC) is a method of tracking data changes in a source database by reading its transaction log rather than polling tables for differences. This provides a complete, ordered history of all data modifications with minimal impact on the source database.

RisingWave implements CDC using the Debezium embedded engine, wrapped in a JNI-accessible handler. When a user creates a CDC source table (e.g., CREATE TABLE ... WITH (connector='mysql-cdc')), the system:

  1. Deserializes the request into a GetEventStreamRequest protobuf message
  2. Creates a DbzConnectorConfig with Debezium connector properties
  3. Launches a DbzCdcEngineRunner that manages the Debezium engine lifecycle
  4. Streams change events through a CdcSourceChannel (JNI) back to the Rust engine

The CDC engine supports both initial snapshot mode (reading all existing data) and streaming mode (reading only new changes from the transaction log).

Usage

Use CDC when:

  • Replicating data from MySQL, PostgreSQL, MongoDB, or SQL Server
  • Building real-time data pipelines from existing databases
  • Migrating data with zero downtime
  • Keeping streaming materialized views synchronized with upstream databases

Theoretical Basis

CDC operates on the principle of log-based replication:

Phase 1: Initial Snapshot
    - Read existing table data
    - Record the log position at snapshot start

Phase 2: Streaming
    - Read transaction log from recorded position
    - Convert log entries to change events:
        INSERT → (+, row)
        UPDATE → (-, old_row), (+, new_row)
        DELETE → (-, row)
    - Stream events to downstream consumer

Each change event includes:

  • Operation type (create, update, delete)
  • Before image (for updates and deletes)
  • After image (for creates and updates)
  • Source metadata (table, position, timestamp)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment