Principle:Risingwavelabs Risingwave External Query Engine Integration

Knowledge Sources	RisingWave Docs Presto Documentation RisingWave
Domains	Data_Lake, Query_Processing, Interoperability
Last Updated	2026-02-09 07:00 GMT

Overview

A data interoperability mechanism that enables external query engines (Presto, Trino, Spark) to read streaming results written to Iceberg tables by RisingWave.

Description

External Query Engine Integration closes the loop in a real-time lakehouse architecture. After RisingWave writes streaming data to Iceberg tables, external analytical engines can query those same tables using their native SQL interfaces. This provides:

Decoupled compute: Different query engines can read the same data without coordinating with RisingWave
Ecosystem compatibility: Organizations can use their existing Spark/Trino/Presto infrastructure
Historical analytics: Query the complete history of streaming results, not just the latest snapshot

The integration relies on the Iceberg table format as a universal contract — any engine with an Iceberg connector can read the tables.

Usage

Use external query engine integration when:

Running complex analytical queries that benefit from Spark/Trino optimizations
Providing data access to teams using non-PostgreSQL tools
Building reports that combine streaming and historical data
Validating Iceberg sink output in integration tests

Theoretical Basis

Architecture:
    RisingWave (streaming) → Iceberg Tables (S3/MinIO)
                                    ↑
    Presto/Trino/Spark (batch) ─────┘

Query Flow:
    1. External engine connects to Iceberg catalog
    2. Reads table metadata (schema, partitions, snapshots)
    3. Plans query over Parquet data files
    4. Executes distributed scan on object storage
    5. Returns results to user

Related Pages

Implemented By

Implementation:Risingwavelabs_Risingwave_Iceberg_External_Query

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment