Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Pola rs Polars Time Range Filtering

From Leeroopedia


Knowledge Sources
Domains Data Engineering, Time Series
Last Updated 2026-02-09 10:00 GMT

Overview

Selecting rows within specific time periods using temporal predicates, supporting exact datetime matching, range filtering, and component-based filtering.

Description

Time series analysis frequently requires isolating subsets of data that fall within particular time windows. Time range filtering applies boolean predicates to temporal columns to select only the rows that satisfy the temporal criteria. This is a fundamental operation that precedes most analytical steps -- whether computing statistics for a specific quarter, comparing year-over-year performance, or extracting training windows for machine learning models.

Polars supports three complementary approaches to temporal filtering:

  1. Range filtering with is_between -- Selects all rows where the temporal column falls within a closed, open, or half-open interval defined by lower and upper datetime bounds. This is the most common pattern for selecting a contiguous time window.
  2. Component-based filtering -- Uses temporal accessors (dt.year(), dt.month(), dt.day(), dt.weekday()) to filter on individual date components. This enables calendar-aware selections such as "all January data" or "weekdays only" without computing explicit datetime bounds.
  3. Combined predicates -- Multiple temporal conditions are composed using logical operators (&, |, ~) to express complex temporal queries such as "weekdays in Q3 2023" or "trading hours in the month of December".

All filtering in Polars is expression-based, meaning the filter predicate is evaluated lazily and can be optimized by the query planner (e.g., predicate pushdown in lazy mode).

Usage

Use this principle whenever you need to:

  • Extract data for a specific date range (e.g., fiscal year, reporting period).
  • Select rows matching calendar criteria (e.g., specific month, day of week).
  • Combine temporal and non-temporal predicates for complex data selection.
  • Create train/test splits based on temporal boundaries.

Theoretical Basis

Temporal filtering applies predicates to datetime columns. A temporal predicate P(t) is a boolean function over the temporal domain:

filter(D, P) = { row in D | P(row.t) = True }

The three filtering approaches correspond to different predicate forms:

Approach Predicate Form Example
Range filtering lower <= t <= upper is_between(datetime(2020,1,1), datetime(2020,12,31))
Component filtering component(t) = value dt.year() == 2020
Combined predicates P1(t) AND P2(t) (dt.month() >= 6) & (dt.year() == 2020)

Range filtering with is_between on a sorted temporal column can exploit binary search to locate the lower and upper bounds in O(log n) time, then emit all rows in the range in O(k) where k is the number of matching rows. This is significantly faster than the O(n) full scan required for unsorted data.

Component filtering decomposes the temporal value into its calendar constituents. This is algebraically equivalent to modular arithmetic on the internal integer representation:

year(t) = extract_year(days_since_epoch(t))
month(t) = extract_month(days_since_epoch(t))

Component-based predicates cannot exploit sorted order (filtering on month alone selects non-contiguous ranges), so they always require a full scan.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment