Implementation:Pola rs Polars Scan LazyFrame Creation
Appearance
Overview
This implementation covers the concrete APIs for creating a LazyFrame from external data sources or from an existing eager DataFrame. These scan functions are the entry points to the Polars lazy query pipeline, producing a LazyFrame that contains a query plan node referencing the data source without reading any rows into memory.
APIs
pl.scan_csv(source) -> LazyFrame— Lazily scan a CSV filepl.scan_parquet(source) -> LazyFrame— Lazily scan a Parquet filepl.scan_ndjson(source) -> LazyFrame— Lazily scan a newline-delimited JSON filepl.scan_ipc(source) -> LazyFrame— Lazily scan an IPC/Arrow fileDataFrame.lazy() -> LazyFrame— Convert an eager DataFrame to a LazyFrame
Source Reference
- File:
docs/source/src/python/user-guide/lazy/using.py(Lines 7-18) - Repository: Pola_rs_Polars
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | str (file path or URL) |
Path to a CSV, Parquet, NDJSON, or IPC file on disk or accessible via URL |
| Input | DataFrame |
An existing eager DataFrame (for .lazy())
|
| Output | LazyFrame |
A lazy query plan reference that has not yet read any data |
Key Parameters
| Parameter | Type | Description |
|---|---|---|
source |
str |
File path or URL to the data source |
try_parse_dates |
bool |
Attempt to parse date/datetime columns automatically (CSV scanner) |
schema_overrides |
dict |
Override inferred schema with explicit column types |
Example Code
import polars as pl
# Scan CSV lazily
q = pl.scan_csv("data.csv")
# Scan Parquet lazily
q = pl.scan_parquet("data.parquet")
# Scan NDJSON lazily
q = pl.scan_ndjson("data.ndjson")
# Scan IPC lazily
q = pl.scan_ipc("data.ipc")
# Convert eager DataFrame to lazy
df = pl.DataFrame({"a": [1, 2, 3]})
q = df.lazy()
Scan with Schema Overrides
import polars as pl
q = pl.scan_csv(
"data.csv",
try_parse_dates=True,
schema_overrides={"amount": pl.Float64, "id": pl.Utf8},
)
Import
import polars as pl
Behavior Notes
- No data is read when a scan function is called. The returned LazyFrame is a lightweight object containing only the query plan node.
- For Parquet and IPC files, schema information is read from file metadata without scanning row data.
- For CSV and NDJSON files, a small sample of rows may be read to infer the schema unless
schema_overridesis fully specified. - DataFrame.lazy() wraps an already-materialized DataFrame into a LazyFrame, enabling it to participate in the lazy query pipeline and benefit from optimizer passes on downstream operations.
- All scan functions support both local file paths and URL sources (HTTP/HTTPS, S3, GCS, Azure depending on feature flags).
Related Pages
- Principle:Pola_rs_Polars_Lazy_Data_Scanning
- Implementation:Pola_rs_Polars_LazyFrame_Expression_Chaining
- Implementation:Pola_rs_Polars_LazyFrame_Collect
- Environment:Pola_rs_Polars_Python_Runtime_Environment
- Heuristic:Pola_rs_Polars_Lazy_Over_Eager_Preference
Metadata
| Field | Value |
|---|---|
| Source Repository | Pola_rs_Polars |
| Source File | docs/source/src/python/user-guide/lazy/using.py:L7-18
|
| Domain | Data Engineering, Lazy Evaluation, File I/O |
| Last Updated | 2026-02-09 10:00 GMT |
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment