Principle:Apache Druid Tuning Parameters
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, Performance_Tuning |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A performance optimization principle that configures resource limits and operational parameters for the ingestion task execution engine.
Description
Tuning Parameters control how the Druid ingestion engine allocates and uses computational resources during data loading. These parameters directly impact ingestion speed, memory usage, and cluster stability.
Key tuning dimensions include:
- Memory limits: maxRowsInMemory, maxBytesInMemory — control when in-memory data is flushed to intermediate segments
- Concurrency: maxNumConcurrentSubTasks — how many parallel indexing tasks run simultaneously
- Output sizing: maxTotalRows — total rows per output segment
- Task behavior: forceGuaranteedRollup, buildV9Directly, chatHandlerTimeout
The tuning configuration is separate from partitioning because it controls execution behavior rather than data layout.
Usage
Use this principle after partitioning configuration to optimize ingestion performance for your specific data volume and cluster resources. Default values work for most cases — tune only when dealing with very large datasets, memory pressure, or specific performance requirements.
Theoretical Basis
Tuning follows a resource budgeting model:
Ingestion Resource Model:
Memory: maxRowsInMemory × avgRowSize ≤ maxBytesInMemory ≤ JVM heap
Parallelism: maxNumConcurrentSubTasks ≤ available worker capacity
Output: segments ≈ totalRows / targetRowsPerSegment
Trade-offs:
Higher maxRowsInMemory → fewer intermediate flushes → faster ingestion but more memory
Higher maxNumConcurrentSubTasks → more parallelism → faster but more cluster resources