Principle:Apache Druid Streaming Source Configuration

Knowledge Sources	Apache Druid Druid Kafka Ingestion
Domains	Streaming_Ingestion, Data_Sampling
Last Updated	2026-02-10 00:00 GMT

Overview

A streaming-specific source configuration principle that establishes connectivity to message brokers (Kafka, Kinesis, Azure Event Hubs) and retrieves sample streaming data.

Description

Streaming Source Configuration is the entry point for creating streaming ingestion supervisors. Unlike batch sources that read finite files, streaming sources connect to continuously flowing message brokers. The configuration includes:

Kafka: Bootstrap servers, topic, consumer properties (security protocol, SASL config)
Kinesis: AWS endpoint, stream name, region, access key/secret
Azure Event Hubs: Uses Kafka protocol with SASL_SSL/PLAIN authentication

The sampler API reads a small batch of messages from the stream to preview the data format. Streaming sources also support metadata column parsing — extracting Kafka timestamps, headers, keys, or Kinesis partition keys as additional columns.

Usage

Use this principle at the start of any streaming ingestion supervisor creation workflow. It is functionally similar to the batch Source Connection principle but with streaming-specific connection parameters and metadata options.

Theoretical Basis

Streaming source configuration follows a broker sample-and-preview pattern:

Streaming connection:
  Kafka:   { bootstrap.servers, topic, consumerProperties }
  Kinesis: { endpoint, stream, region, awsAccessKeyId, awsSecretAccessKey }

Sample request:
  sampleForConnect(spec, sampleStrategy) → POST /druid/indexer/v1/sampler
  Uses streaming-specific input format wrappers:
    Kafka:   { type: 'kafka', valueFormat: { type: 'regex', pattern: '(.*)' } }
    Kinesis: { type: 'kinesis', valueFormat: { type: 'regex', pattern: '(.*)' } }

Metadata columns (optional):
  Kafka:   kafka.timestamp, kafka.header.*, kafka.key
  Kinesis: kinesis.partitionKey, kinesis.timestamp

Related Pages

Implemented By

Implementation:Apache_Druid_SampleForConnect_Streaming

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment