Principle:Risingwavelabs Risingwave Sink Data Delivery
| Knowledge Sources | |
|---|---|
| Domains | Streaming, Data_Delivery, Connectors |
| Last Updated | 2026-02-09 07:00 GMT |
Overview
A data delivery mechanism that continuously writes transformed streaming results to external systems through connector-based sink pipelines with exactly-once checkpoint semantics.
Description
Sink Data Delivery is the output stage of a streaming pipeline. After data is ingested, transformed, and aggregated in materialized views, sinks deliver the results to external systems such as databases (PostgreSQL, MySQL, ClickHouse), search engines (Elasticsearch), message brokers (Kafka), or data lakes (Iceberg).
The sink framework in RisingWave uses a checkpoint-based consistency model. Data is written in epochs, and a barrier mechanism ensures that all sink writers have successfully written their data before a checkpoint is committed. This provides at-least-once delivery semantics, with exactly-once achievable through idempotent writes or transactional sinks.
The sink writer protocol follows a streaming gRPC pattern: StartSink → WriteBatch (repeated) → Barrier (checkpoint) → CommitResponse.
Usage
Use sink data delivery when:
- Exporting streaming results to downstream databases
- Building data pipelines that feed multiple systems
- Replicating data to search engines for full-text indexing
- Writing to data lakes for long-term analytics
Theoretical Basis
Sink delivery follows the epoch-based checkpoint protocol:
Epoch N:
1. beginEpoch(N)
2. write(batch_1), write(batch_2), ...
3. barrier(checkpoint=true)
4. SinkCoordinator.commit(N, metadata)
Epoch N+1:
1. beginEpoch(N+1)
...
The two-phase commit ensures that all sink writers agree on checkpoint boundaries:
Phase 1: Each SinkWriter calls barrier() and returns metadata
Phase 2: SinkCoordinator collects all metadata and commits atomically