Principle:Apache Druid Data Filtering

Knowledge Sources	Apache Druid Druid Filters
Domains	Data_Ingestion, Data_Filtering
Last Updated	2026-02-10 00:00 GMT

Overview

A data reduction principle that applies row-level filters during ingestion to exclude unwanted records before storage.

Description

Data Filtering allows users to define conditions that rows must satisfy to be included in the ingested dataset. Filters are specified in the transformSpec.filter section of the ingestion spec and are applied server-side during ingestion, reducing the amount of data stored in Druid segments.

Supported filter types include:

Selector filter: Exact value match on a dimension
Regex filter: Regular expression match
Range filter: Numeric or lexicographic range comparison
Logical filters: AND, OR, NOT combinations of other filters
Expression filter: Druid expression-based filtering

Filtering during ingestion is more efficient than post-ingestion filtering because excluded rows never consume storage or indexing resources.

Usage

Use this principle after data transformation when certain rows need to be excluded from the final dataset. Filtering is optional — skip this step if all rows should be ingested.

Theoretical Basis

Ingestion filtering follows a predicate evaluation pattern:

Filter = { type: 'selector' | 'regex' | 'range' | 'and' | 'or' | 'not' | 'expression', ... }
TransformSpec.filter = Filter

For each row:
  if evaluate(filter, row) == true:
    include row in output
  else:
    discard row

The sampler evaluates filters on cached data and returns only matching rows, enabling users to verify that their filter conditions produce the expected results.

Related Pages

Implemented By

Implementation:Apache_Druid_SampleForFilter

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment