Heuristic:Pola rs Polars Lazy Over Eager Preference

Knowledge Sources	Polars Lazy API Polars Optimizations
Domains	Optimization, Query_Planning
Last Updated	2026-02-09 10:00 GMT

Overview

Always prefer the Polars Lazy API over the Eager API to enable automatic query optimization, reduce memory usage, and unlock streaming execution for larger-than-memory datasets.

Description

Polars supports two execution modes: eager (immediate execution) and lazy (deferred execution with optimization). The lazy API builds a query plan that is optimized before execution, enabling predicate pushdown, projection pushdown, slice pushdown, common subplan elimination, join ordering, and expression simplification. The eager API actually calls the lazy API internally for many operations, but starting with the lazy API from the beginning (via `scan_*` functions) allows the full optimization pipeline to run, including pushing filters and projections down to the data source level.

Usage

Use the lazy API (via `pl.scan_csv`, `pl.scan_parquet`, `df.lazy()`, etc.) for all production workflows. Only use the eager API (`pl.read_csv`, direct DataFrame operations) during interactive exploration when you need to inspect intermediate results and don't yet know what your final query will look like.

The Insight (Rule of Thumb)

Action: Use `pl.scan_*` instead of `pl.read_*` as the entry point for queries. Chain expressions with `.select()`, `.filter()`, `.group_by()`, etc., then call `.collect()` at the end.
Value: Enables 8 automatic optimizations: predicate pushdown, projection pushdown, slice pushdown, common subplan elimination, expression simplification, join ordering, type coercion, and cardinality estimation.
Trade-off: Cannot inspect intermediate results without collecting. Schema errors are caught at plan-build time rather than incrementally.

Reasoning

The lazy API enables Polars to see the entire query before executing any part of it. Consider a query that filters rows and selects columns: with the eager API, Polars first reads all rows and all columns, then filters, then selects. With the lazy API, Polars pushes the filter to the scan level (reading only matching rows) and only reads the needed columns. This dramatically reduces I/O and memory usage. The Polars documentation explicitly states: "the lazy API should be preferred unless you are either interested in the intermediate results or are doing exploratory work."

The lazy API also unlocks streaming execution via `collect(engine="streaming")`, enabling processing of datasets larger than available memory.

Optimizations applied by the lazy engine:

Optimization	What It Does	Runs
Predicate pushdown	Applies filters at scan level	1 time
Projection pushdown	Selects only needed columns at scan level	1 time
Slice pushdown	Only loads required slice from scan	1 time
Common subplan elimination	Caches shared subtrees/file scans	1 time
Expression simplification	Constant folding, faster alternatives	Until fixed point
Join ordering	Executes smaller join branches first	1 time
Type coercion	Minimal memory type promotion	Until fixed point
Cardinality estimation	Optimal group-by strategy selection	0/n times

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment