Principle:Apache Druid Data Filtering
| Knowledge Sources | |
|---|---|
| Domains | Data_Ingestion, Data_Filtering |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
A data reduction principle that applies row-level filters during ingestion to exclude unwanted records before storage.
Description
Data Filtering allows users to define conditions that rows must satisfy to be included in the ingested dataset. Filters are specified in the transformSpec.filter section of the ingestion spec and are applied server-side during ingestion, reducing the amount of data stored in Druid segments.
Supported filter types include:
- Selector filter: Exact value match on a dimension
- Regex filter: Regular expression match
- Range filter: Numeric or lexicographic range comparison
- Logical filters: AND, OR, NOT combinations of other filters
- Expression filter: Druid expression-based filtering
Filtering during ingestion is more efficient than post-ingestion filtering because excluded rows never consume storage or indexing resources.
Usage
Use this principle after data transformation when certain rows need to be excluded from the final dataset. Filtering is optional — skip this step if all rows should be ingested.
Theoretical Basis
Ingestion filtering follows a predicate evaluation pattern:
Filter = { type: 'selector' | 'regex' | 'range' | 'and' | 'or' | 'not' | 'expression', ... }
TransformSpec.filter = Filter
For each row:
if evaluate(filter, row) == true:
include row in output
else:
discard row
The sampler evaluates filters on cached data and returns only matching rows, enabling users to verify that their filter conditions produce the expected results.