Principle:Risingwavelabs Risingwave Iceberg Sink Integration
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Streaming, Iceberg |
| Last Updated | 2026-02-09 07:00 GMT |
Overview
A streaming-to-lakehouse integration mechanism that continuously writes transformed streaming data into Apache Iceberg tables for durable, queryable storage in open table format.
Description
Iceberg Sink Integration bridges real-time streaming processing with the Apache Iceberg open table format. This enables a real-time lakehouse architecture where streaming data is continuously written to Iceberg tables stored on object storage (S3, MinIO, GCS), and can be queried by external engines like Spark, Trino, and Presto.
The integration supports two write modes:
- Append-only: New rows are continuously appended to the Iceberg table
- Upsert: Rows are inserted or updated based on primary key, using Iceberg's merge-on-read or copy-on-write strategies
RisingWave's Iceberg sink handles the full lifecycle: schema management (via catalog operations), data file writing (Parquet format), manifest management, and snapshot commits. It supports multiple catalog types including storage-based, Hive, Glue, REST, and JDBC catalogs.
Usage
Use Iceberg sink integration when:
- Building a real-time lakehouse architecture
- Streaming data to object storage for long-term analytics
- Enabling external query engines to read streaming results
- Requiring open table format compatibility (Iceberg V1/V2/V3)
Theoretical Basis
Iceberg's table format provides ACID guarantees on object storage:
Iceberg Table Structure:
metadata/
v1.metadata.json -- table schema, partition spec
snap-001.avro -- snapshot manifest list
data/
part-00001.parquet -- data files
Streaming Write Process:
1. Accumulate rows in memory buffer
2. Flush to Parquet data files on object storage
3. Create manifest entries for new files
4. Commit new snapshot atomically (metadata update)
Checkpoint Integration:
- RisingWave barriers trigger Iceberg commits
- commit_checkpoint_interval controls commit frequency
- Default: every 60 checkpoints