Workflow:Apache Druid Streaming Ingestion Management
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Stream_Processing, Real_Time_Analytics |
| Last Updated | 2026-02-10 10:00 GMT |
Overview
End-to-end process for creating, monitoring, and managing streaming ingestion supervisors in Apache Druid, covering Kafka and Kinesis data source configuration through the web console's Load Data wizard and Supervisors management view.
Description
This workflow covers the streaming ingestion lifecycle in the Druid web console. Streaming ingestion uses supervisors that continuously read from message streams (Apache Kafka topics or Amazon Kinesis streams) and create Druid segments in near-real-time. The workflow spans two console views: the Load Data wizard for initial supervisor creation (reusing the same multi-step process as batch ingestion but with streaming-specific configuration), and the Supervisors view for ongoing monitoring and management of running supervisors.
Key capabilities:
- Supervisor creation via the Load Data wizard with Kafka or Kinesis input sources
- Real-time supervisor status monitoring with lag metrics
- Per-partition offset tracking and management
- Supervisor lifecycle operations (suspend, resume, reset, terminate)
- Offset reset for reprocessing or skipping problematic data
- Task group handoff control for manual segment publishing
- Supervisor spec history with diff comparison
- Aggregate statistics including throughput, lag rates, and error counts
Usage
Execute this workflow when you need to continuously ingest data from Apache Kafka topics or Amazon Kinesis streams into Druid for near-real-time analytics. This covers the full lifecycle from initial supervisor setup through ongoing operational management, health monitoring, and troubleshooting.
Execution Steps
Step 1: Streaming Source Configuration
Create a new streaming ingestion supervisor using the Load Data wizard. Select either Apache Kafka or Amazon Kinesis as the input source, then configure connection parameters (bootstrap servers, topic name, consumer properties for Kafka; stream name, endpoint, AWS credentials for Kinesis). The wizard uses the sampler API to validate connectivity and preview streaming data.
Key considerations:
- Kafka configuration requires bootstrap servers, topic, and consumer group settings
- Kinesis configuration requires stream name, AWS region, and optional endpoint
- The sampler previews a snapshot of current stream data for parser validation
- Consumer properties (security protocol, SASL, SSL) must match broker requirements
Step 2: Schema and Ingestion Spec Definition
Configure parsing, timestamp, transformation, filtering, and schema settings through the same wizard steps as batch ingestion. The streaming-specific differences include tuning parameters for task duration, task count, replicas, and handoff conditions.
Key considerations:
- Streaming tasks are ephemeral; they run for a configured duration then hand off segments
- Task count and replicas affect throughput and fault tolerance
- The ingestion spec includes ioConfig settings specific to streaming (period, useEarliestOffset)
- Rollup and compaction settings affect real-time query performance vs. storage efficiency
Step 3: Supervisor Submission
Submit the completed streaming ingestion spec as a supervisor. Unlike batch tasks, supervisors are persistent; they continuously manage a pool of indexing tasks that read from the stream. The supervisor is submitted to the Druid Overlord supervisor API.
Key considerations:
- The supervisor API creates a persistent supervisor entity, not a one-time task
- The supervisor automatically creates and manages indexing task groups
- Each task group handles a subset of stream partitions
- On submission, the user is redirected to the Supervisors view for monitoring
Step 4: Supervisor Health Monitoring
Monitor running supervisors in the Supervisors view. The view displays supervisor status (RUNNING, SUSPENDED, PENDING), aggregate lag metrics, task counts, and per-partition statistics. The status panel queries the supervisor status API and displays a summary row for each active supervisor.
Key considerations:
- Lag metrics indicate how far behind real-time the ingestion is (in offsets or milliseconds)
- High lag suggests insufficient task count, slow processing, or stream throughput spikes
- Status transitions (RUNNING → UNHEALTHY) trigger attention for operational response
- The statistics panel shows 1-minute, 5-minute, and 15-minute aggregation windows
Step 5: Operational Management
Perform lifecycle operations on supervisors as needed. Operations include suspending (pause without losing state), resuming (restart from last committed offsets), resetting (restart from configured start offsets), and terminating (permanently stop the supervisor). Offset management allows targeted reprocessing of specific partitions.
Key considerations:
- Suspend/resume preserves committed offsets; the supervisor resumes where it left off
- Reset restarts from the beginning or from configured earliest/latest offsets
- Terminate permanently removes the supervisor; remaining tasks complete then stop
- Offset reset allows rewinding specific partitions for reprocessing
- Task group handoff forces early segment publication for specific task groups
Step 6: Spec Management and Troubleshooting
View, edit, and compare supervisor spec versions. The supervisor history panel shows all spec changes with diff comparison. For troubleshooting, examine task logs, ingestion statistics, and error messages. Supervisor specs can be updated by submitting a new spec with the same datasource name.
What happens:
- Supervisor spec updates trigger a graceful transition to the new configuration
- Task logs are accessible via the task detail panel for debugging ingestion issues
- Statistics tables show per-task-group metrics for identifying slow or failing partitions
- The diff dialog highlights changes between spec versions for audit purposes