Principle:Risingwavelabs Risingwave External Query Engine Integration
| Knowledge Sources | |
|---|---|
| Domains | Data_Lake, Query_Processing, Interoperability |
| Last Updated | 2026-02-09 07:00 GMT |
Overview
A data interoperability mechanism that enables external query engines (Presto, Trino, Spark) to read streaming results written to Iceberg tables by RisingWave.
Description
External Query Engine Integration closes the loop in a real-time lakehouse architecture. After RisingWave writes streaming data to Iceberg tables, external analytical engines can query those same tables using their native SQL interfaces. This provides:
- Decoupled compute: Different query engines can read the same data without coordinating with RisingWave
- Ecosystem compatibility: Organizations can use their existing Spark/Trino/Presto infrastructure
- Historical analytics: Query the complete history of streaming results, not just the latest snapshot
The integration relies on the Iceberg table format as a universal contract — any engine with an Iceberg connector can read the tables.
Usage
Use external query engine integration when:
- Running complex analytical queries that benefit from Spark/Trino optimizations
- Providing data access to teams using non-PostgreSQL tools
- Building reports that combine streaming and historical data
- Validating Iceberg sink output in integration tests
Theoretical Basis
Architecture:
RisingWave (streaming) → Iceberg Tables (S3/MinIO)
↑
Presto/Trino/Spark (batch) ─────┘
Query Flow:
1. External engine connects to Iceberg catalog
2. Reads table metadata (schema, partitions, snapshots)
3. Plans query over Parquet data files
4. Executes distributed scan on object storage
5. Returns results to user