Principle:Risingwavelabs Risingwave Streaming Source Ingestion
| Knowledge Sources | |
|---|---|
| Domains | Streaming, Data_Ingestion |
| Last Updated | 2026-02-09 07:00 GMT |
Overview
A data ingestion mechanism that connects external message brokers and event streams to a streaming database for continuous real-time processing.
Description
Streaming Source Ingestion is the foundational step in any streaming ETL pipeline. It establishes a persistent connection between an external data source (such as Apache Kafka, Pulsar, Kinesis, or a CDC connector) and the streaming database engine. Unlike traditional batch imports, streaming ingestion maintains an open channel that continuously receives new events as they arrive.
In RisingWave, this is achieved through the CREATE SOURCE SQL statement, which registers a streaming source with the system catalog. The source definition includes the connector type, connection properties, data format (JSON, Avro, Protobuf), and encoding. Once registered, the source becomes available for downstream materialized views to consume from.
The key distinction from batch ETL is that streaming sources are stateful — they track their consumption offset (e.g., Kafka consumer group offset) and can resume from their last position after restarts.
Usage
Use this principle when you need to bring external event data into a streaming database for real-time processing. This is the correct approach when:
- Data arrives continuously from message brokers (Kafka, Pulsar, Kinesis)
- You need low-latency ingestion (sub-second to seconds)
- The data format is structured (JSON, Avro, Protobuf, CSV)
- You want SQL-based stream processing rather than custom application code
Theoretical Basis
Streaming ingestion follows the publish-subscribe pattern where the database acts as a consumer:
Source (Kafka/Pulsar/Kinesis)
|
v
[Connector] -- reads from topic/stream with offset tracking
|
v
[Format Decoder] -- deserializes JSON/Avro/Protobuf
|
v
[Source Operator] -- emits rows into streaming engine
|
v
[Downstream MVs] -- consume the stream
The source connector maintains exactly-once or at-least-once semantics depending on the upstream system's capabilities and the source configuration.