Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Risingwavelabs Risingwave Streaming Source Ingestion

From Leeroopedia


Knowledge Sources
Domains Streaming, Data_Ingestion
Last Updated 2026-02-09 07:00 GMT

Overview

A data ingestion mechanism that connects external message brokers and event streams to a streaming database for continuous real-time processing.

Description

Streaming Source Ingestion is the foundational step in any streaming ETL pipeline. It establishes a persistent connection between an external data source (such as Apache Kafka, Pulsar, Kinesis, or a CDC connector) and the streaming database engine. Unlike traditional batch imports, streaming ingestion maintains an open channel that continuously receives new events as they arrive.

In RisingWave, this is achieved through the CREATE SOURCE SQL statement, which registers a streaming source with the system catalog. The source definition includes the connector type, connection properties, data format (JSON, Avro, Protobuf), and encoding. Once registered, the source becomes available for downstream materialized views to consume from.

The key distinction from batch ETL is that streaming sources are stateful — they track their consumption offset (e.g., Kafka consumer group offset) and can resume from their last position after restarts.

Usage

Use this principle when you need to bring external event data into a streaming database for real-time processing. This is the correct approach when:

  • Data arrives continuously from message brokers (Kafka, Pulsar, Kinesis)
  • You need low-latency ingestion (sub-second to seconds)
  • The data format is structured (JSON, Avro, Protobuf, CSV)
  • You want SQL-based stream processing rather than custom application code

Theoretical Basis

Streaming ingestion follows the publish-subscribe pattern where the database acts as a consumer:

Source (Kafka/Pulsar/Kinesis)
    |
    v
[Connector] -- reads from topic/stream with offset tracking
    |
    v
[Format Decoder] -- deserializes JSON/Avro/Protobuf
    |
    v
[Source Operator] -- emits rows into streaming engine
    |
    v
[Downstream MVs] -- consume the stream

The source connector maintains exactly-once or at-least-once semantics depending on the upstream system's capabilities and the source configuration.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment