Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Apache Druid Streaming Source Configuration

From Leeroopedia


Knowledge Sources
Domains Streaming_Ingestion, Data_Sampling
Last Updated 2026-02-10 00:00 GMT

Overview

A streaming-specific source configuration principle that establishes connectivity to message brokers (Kafka, Kinesis, Azure Event Hubs) and retrieves sample streaming data.

Description

Streaming Source Configuration is the entry point for creating streaming ingestion supervisors. Unlike batch sources that read finite files, streaming sources connect to continuously flowing message brokers. The configuration includes:

  • Kafka: Bootstrap servers, topic, consumer properties (security protocol, SASL config)
  • Kinesis: AWS endpoint, stream name, region, access key/secret
  • Azure Event Hubs: Uses Kafka protocol with SASL_SSL/PLAIN authentication

The sampler API reads a small batch of messages from the stream to preview the data format. Streaming sources also support metadata column parsing — extracting Kafka timestamps, headers, keys, or Kinesis partition keys as additional columns.

Usage

Use this principle at the start of any streaming ingestion supervisor creation workflow. It is functionally similar to the batch Source Connection principle but with streaming-specific connection parameters and metadata options.

Theoretical Basis

Streaming source configuration follows a broker sample-and-preview pattern:

Streaming connection:
  Kafka:   { bootstrap.servers, topic, consumerProperties }
  Kinesis: { endpoint, stream, region, awsAccessKeyId, awsSecretAccessKey }

Sample request:
  sampleForConnect(spec, sampleStrategy) → POST /druid/indexer/v1/sampler
  Uses streaming-specific input format wrappers:
    Kafka:   { type: 'kafka', valueFormat: { type: 'regex', pattern: '(.*)' } }
    Kinesis: { type: 'kinesis', valueFormat: { type: 'regex', pattern: '(.*)' } }

Metadata columns (optional):
  Kafka:   kafka.timestamp, kafka.header.*, kafka.key
  Kinesis: kinesis.partitionKey, kinesis.timestamp

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment