Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Druid Data Filtering

From Leeroopedia


Knowledge Sources
Domains Data_Ingestion, Data_Filtering
Last Updated 2026-02-10 00:00 GMT

Overview

A data reduction principle that applies row-level filters during ingestion to exclude unwanted records before storage.

Description

Data Filtering allows users to define conditions that rows must satisfy to be included in the ingested dataset. Filters are specified in the transformSpec.filter section of the ingestion spec and are applied server-side during ingestion, reducing the amount of data stored in Druid segments.

Supported filter types include:

  • Selector filter: Exact value match on a dimension
  • Regex filter: Regular expression match
  • Range filter: Numeric or lexicographic range comparison
  • Logical filters: AND, OR, NOT combinations of other filters
  • Expression filter: Druid expression-based filtering

Filtering during ingestion is more efficient than post-ingestion filtering because excluded rows never consume storage or indexing resources.

Usage

Use this principle after data transformation when certain rows need to be excluded from the final dataset. Filtering is optional — skip this step if all rows should be ingested.

Theoretical Basis

Ingestion filtering follows a predicate evaluation pattern:

Filter = { type: 'selector' | 'regex' | 'range' | 'and' | 'or' | 'not' | 'expression', ... }
TransformSpec.filter = Filter

For each row:
  if evaluate(filter, row) == true:
    include row in output
  else:
    discard row

The sampler evaluates filters on cached data and returns only matching rows, enabling users to verify that their filter conditions produce the expected results.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment