Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Risingwavelabs Risingwave Iceberg Sink Integration

From Leeroopedia


Knowledge Sources
Domains Data_Lake, Streaming, Iceberg
Last Updated 2026-02-09 07:00 GMT

Overview

A streaming-to-lakehouse integration mechanism that continuously writes transformed streaming data into Apache Iceberg tables for durable, queryable storage in open table format.

Description

Iceberg Sink Integration bridges real-time streaming processing with the Apache Iceberg open table format. This enables a real-time lakehouse architecture where streaming data is continuously written to Iceberg tables stored on object storage (S3, MinIO, GCS), and can be queried by external engines like Spark, Trino, and Presto.

The integration supports two write modes:

  • Append-only: New rows are continuously appended to the Iceberg table
  • Upsert: Rows are inserted or updated based on primary key, using Iceberg's merge-on-read or copy-on-write strategies

RisingWave's Iceberg sink handles the full lifecycle: schema management (via catalog operations), data file writing (Parquet format), manifest management, and snapshot commits. It supports multiple catalog types including storage-based, Hive, Glue, REST, and JDBC catalogs.

Usage

Use Iceberg sink integration when:

  • Building a real-time lakehouse architecture
  • Streaming data to object storage for long-term analytics
  • Enabling external query engines to read streaming results
  • Requiring open table format compatibility (Iceberg V1/V2/V3)

Theoretical Basis

Iceberg's table format provides ACID guarantees on object storage:

Iceberg Table Structure:
    metadata/
        v1.metadata.json  -- table schema, partition spec
        snap-001.avro     -- snapshot manifest list
    data/
        part-00001.parquet -- data files

Streaming Write Process:
    1. Accumulate rows in memory buffer
    2. Flush to Parquet data files on object storage
    3. Create manifest entries for new files
    4. Commit new snapshot atomically (metadata update)

Checkpoint Integration:
    - RisingWave barriers trigger Iceberg commits
    - commit_checkpoint_interval controls commit frequency
    - Default: every 60 checkpoints

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment