Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Druid Streaming Schema Spec

From Leeroopedia


Knowledge Sources
Domains Streaming_Ingestion, Schema_Design
Last Updated 2026-02-10 00:00 GMT

Overview

A streaming-specific schema configuration principle that defines parsing, timestamps, transforms, filters, and schema for continuously ingested streaming data.

Description

Streaming Schema and Spec Configuration reuses the same sampler-based pipeline as batch ingestion (parsing → timestamp → transform → filter → schema) but with streaming-specific additions:

  • Streaming metadata columns: Optional columns derived from message metadata (Kafka timestamps, headers, keys; Kinesis partition keys)
  • Supervisor idle config: Settings for handling gaps in stream activity
  • Streaming tuning: Parameters specific to streaming indexing (task duration, completion timeout, etc.)

The wizard steps are identical to batch Steps 3-7 (sampleForParser through sampleForSchema), using the same sampler API with cached streaming data.

Usage

Use this principle after streaming source connection succeeds. The configuration steps mirror the batch workflow but produce a supervisor spec instead of a task spec.

Theoretical Basis

Streaming schema configuration follows the same incremental sampler refinement pattern as batch, with streaming extensions:

sampleForParser(spec, cacheRows)     → Parsed streaming messages
sampleForTimestamp(spec, cacheRows)  → __time extraction
sampleForTransform(spec, cacheRows)  → Derived columns
sampleForFilter(spec, cacheRows)     → Row filtering
sampleForSchema(spec, cacheRows)     → Final schema with dims + metrics

Streaming-specific fields:
  ioConfig.type: 'kafka' | 'kinesis'
  inputFormat.type: 'kafka' | 'kinesis' (wrapper format)
  inputFormat.valueFormat: actual data format (json, csv, etc.)
  inputFormat.headerFormat, keyFormat: optional metadata parsing

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment