Principle:Apache Druid Streaming Schema Spec

Knowledge Sources	Apache Druid Druid Supervisor Spec
Domains	Streaming_Ingestion, Schema_Design
Last Updated	2026-02-10 00:00 GMT

Overview

A streaming-specific schema configuration principle that defines parsing, timestamps, transforms, filters, and schema for continuously ingested streaming data.

Description

Streaming Schema and Spec Configuration reuses the same sampler-based pipeline as batch ingestion (parsing → timestamp → transform → filter → schema) but with streaming-specific additions:

Streaming metadata columns: Optional columns derived from message metadata (Kafka timestamps, headers, keys; Kinesis partition keys)
Supervisor idle config: Settings for handling gaps in stream activity
Streaming tuning: Parameters specific to streaming indexing (task duration, completion timeout, etc.)

The wizard steps are identical to batch Steps 3-7 (sampleForParser through sampleForSchema), using the same sampler API with cached streaming data.

Usage

Use this principle after streaming source connection succeeds. The configuration steps mirror the batch workflow but produce a supervisor spec instead of a task spec.

Theoretical Basis

Streaming schema configuration follows the same incremental sampler refinement pattern as batch, with streaming extensions:

sampleForParser(spec, cacheRows)     → Parsed streaming messages
sampleForTimestamp(spec, cacheRows)  → __time extraction
sampleForTransform(spec, cacheRows)  → Derived columns
sampleForFilter(spec, cacheRows)     → Row filtering
sampleForSchema(spec, cacheRows)     → Final schema with dims + metrics

Streaming-specific fields:
  ioConfig.type: 'kafka' | 'kinesis'
  inputFormat.type: 'kafka' | 'kinesis' (wrapper format)
  inputFormat.valueFormat: actual data format (json, csv, etc.)
  inputFormat.headerFormat, keyFormat: optional metadata parsing

Related Pages

Implemented By

Implementation:Apache_Druid_Sampler_Streaming_Schema

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment