Principle:Risingwavelabs Risingwave CDC Snapshot Verification
| Knowledge Sources | |
|---|---|
| Domains | CDC, Data_Consistency |
| Last Updated | 2026-02-09 07:00 GMT |
Overview
A data consistency mechanism that verifies the initial snapshot of a CDC source has been completely captured before transitioning to continuous streaming mode.
Description
CDC Snapshot Verification ensures data completeness during the critical transition from initial data load to live streaming. When a CDC source is first created, the Debezium engine performs an initial snapshot — reading all existing rows from the source tables. This snapshot must complete successfully before the engine switches to reading the transaction log for new changes.
The DbzCdcEngineRunner manages this lifecycle by:
- Creating a Debezium embedded engine with the appropriate connector configuration
- Executing the engine in a dedicated thread pool
- Monitoring the snapshot phase through the DbzChangeEventConsumer
- Converting change events to protobuf CdcMessage format
- Tracking snapshot completion state in the engine configuration
The snapshot mode can be configured as initial (full snapshot + streaming) or no_data (skip snapshot, start from current log position).
Usage
Use snapshot verification when:
- Initializing a new CDC source table with existing data
- Recovering from connector failures
- Validating that all historical data has been captured
- Monitoring CDC pipeline health during initial setup
Theoretical Basis
Engine Lifecycle:
1. DbzCdcEngineRunner.create(config) → engine instance
2. DbzCdcEngineRunner.start() → launches engine thread
3. Debezium executes snapshot phase:
- Acquires table locks (brief)
- Records binlog/WAL position
- Reads all rows from source tables
- Releases locks
4. Snapshot events flow through DbzChangeEventConsumer.handleBatch()
5. Engine transitions to streaming phase
6. Continuous change events flow to Rust via JNI channel