Principle:Apache Druid Streaming Source Configuration
| Knowledge Sources | |
|---|---|
| Domains | Streaming_Ingestion, Data_Sampling |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A streaming-specific source configuration principle that establishes connectivity to message brokers (Kafka, Kinesis, Azure Event Hubs) and retrieves sample streaming data.
Description
Streaming Source Configuration is the entry point for creating streaming ingestion supervisors. Unlike batch sources that read finite files, streaming sources connect to continuously flowing message brokers. The configuration includes:
- Kafka: Bootstrap servers, topic, consumer properties (security protocol, SASL config)
- Kinesis: AWS endpoint, stream name, region, access key/secret
- Azure Event Hubs: Uses Kafka protocol with SASL_SSL/PLAIN authentication
The sampler API reads a small batch of messages from the stream to preview the data format. Streaming sources also support metadata column parsing — extracting Kafka timestamps, headers, keys, or Kinesis partition keys as additional columns.
Usage
Use this principle at the start of any streaming ingestion supervisor creation workflow. It is functionally similar to the batch Source Connection principle but with streaming-specific connection parameters and metadata options.
Theoretical Basis
Streaming source configuration follows a broker sample-and-preview pattern:
Streaming connection:
Kafka: { bootstrap.servers, topic, consumerProperties }
Kinesis: { endpoint, stream, region, awsAccessKeyId, awsSecretAccessKey }
Sample request:
sampleForConnect(spec, sampleStrategy) → POST /druid/indexer/v1/sampler
Uses streaming-specific input format wrappers:
Kafka: { type: 'kafka', valueFormat: { type: 'regex', pattern: '(.*)' } }
Kinesis: { type: 'kinesis', valueFormat: { type: 'regex', pattern: '(.*)' } }
Metadata columns (optional):
Kafka: kafka.timestamp, kafka.header.*, kafka.key
Kinesis: kinesis.partitionKey, kinesis.timestamp