Workflow:Apache Druid Streaming Ingestion Management

Knowledge Sources	Apache Druid Streaming Ingestion Apache Kafka Ingestion Amazon Kinesis Ingestion
Domains	Data_Engineering, Stream_Processing, Real_Time_Analytics
Last Updated	2026-02-10 10:00 GMT

Overview

End-to-end process for creating, monitoring, and managing streaming ingestion supervisors in Apache Druid, covering Kafka and Kinesis data source configuration through the web console's Load Data wizard and Supervisors management view.

Description

This workflow covers the streaming ingestion lifecycle in the Druid web console. Streaming ingestion uses supervisors that continuously read from message streams (Apache Kafka topics or Amazon Kinesis streams) and create Druid segments in near-real-time. The workflow spans two console views: the Load Data wizard for initial supervisor creation (reusing the same multi-step process as batch ingestion but with streaming-specific configuration), and the Supervisors view for ongoing monitoring and management of running supervisors.

Key capabilities:

Supervisor creation via the Load Data wizard with Kafka or Kinesis input sources
Real-time supervisor status monitoring with lag metrics
Per-partition offset tracking and management
Supervisor lifecycle operations (suspend, resume, reset, terminate)
Offset reset for reprocessing or skipping problematic data
Task group handoff control for manual segment publishing
Supervisor spec history with diff comparison
Aggregate statistics including throughput, lag rates, and error counts

Usage

Execute this workflow when you need to continuously ingest data from Apache Kafka topics or Amazon Kinesis streams into Druid for near-real-time analytics. This covers the full lifecycle from initial supervisor setup through ongoing operational management, health monitoring, and troubleshooting.

Execution Steps

Step 1: Streaming Source Configuration

Create a new streaming ingestion supervisor using the Load Data wizard. Select either Apache Kafka or Amazon Kinesis as the input source, then configure connection parameters (bootstrap servers, topic name, consumer properties for Kafka; stream name, endpoint, AWS credentials for Kinesis). The wizard uses the sampler API to validate connectivity and preview streaming data.

Key considerations:

Kafka configuration requires bootstrap servers, topic, and consumer group settings
Kinesis configuration requires stream name, AWS region, and optional endpoint
The sampler previews a snapshot of current stream data for parser validation
Consumer properties (security protocol, SASL, SSL) must match broker requirements

Step 2: Schema and Ingestion Spec Definition

Configure parsing, timestamp, transformation, filtering, and schema settings through the same wizard steps as batch ingestion. The streaming-specific differences include tuning parameters for task duration, task count, replicas, and handoff conditions.

Key considerations:

Streaming tasks are ephemeral; they run for a configured duration then hand off segments
Task count and replicas affect throughput and fault tolerance
The ingestion spec includes ioConfig settings specific to streaming (period, useEarliestOffset)
Rollup and compaction settings affect real-time query performance vs. storage efficiency

Step 3: Supervisor Submission

Submit the completed streaming ingestion spec as a supervisor. Unlike batch tasks, supervisors are persistent; they continuously manage a pool of indexing tasks that read from the stream. The supervisor is submitted to the Druid Overlord supervisor API.

Key considerations:

The supervisor API creates a persistent supervisor entity, not a one-time task
The supervisor automatically creates and manages indexing task groups
Each task group handles a subset of stream partitions
On submission, the user is redirected to the Supervisors view for monitoring

Step 4: Supervisor Health Monitoring

Monitor running supervisors in the Supervisors view. The view displays supervisor status (RUNNING, SUSPENDED, PENDING), aggregate lag metrics, task counts, and per-partition statistics. The status panel queries the supervisor status API and displays a summary row for each active supervisor.

Key considerations:

Lag metrics indicate how far behind real-time the ingestion is (in offsets or milliseconds)
High lag suggests insufficient task count, slow processing, or stream throughput spikes
Status transitions (RUNNING → UNHEALTHY) trigger attention for operational response
The statistics panel shows 1-minute, 5-minute, and 15-minute aggregation windows

Step 5: Operational Management

Perform lifecycle operations on supervisors as needed. Operations include suspending (pause without losing state), resuming (restart from last committed offsets), resetting (restart from configured start offsets), and terminating (permanently stop the supervisor). Offset management allows targeted reprocessing of specific partitions.

Key considerations:

Suspend/resume preserves committed offsets; the supervisor resumes where it left off
Reset restarts from the beginning or from configured earliest/latest offsets
Terminate permanently removes the supervisor; remaining tasks complete then stop
Offset reset allows rewinding specific partitions for reprocessing
Task group handoff forces early segment publication for specific task groups

Step 6: Spec Management and Troubleshooting

View, edit, and compare supervisor spec versions. The supervisor history panel shows all spec changes with diff comparison. For troubleshooting, examine task logs, ingestion statistics, and error messages. Supervisor specs can be updated by submitting a new spec with the same datasource name.

What happens:

Supervisor spec updates trigger a graceful transition to the new configuration
Task logs are accessible via the task detail panel for debugging ingestion issues
Statistics tables show per-task-group metrics for identifying slow or failing partitions
The diff dialog highlights changes between spec versions for audit purposes

Execution Diagram

GitHub URL

Workflow Repository