Implementation:Pola rs Polars Scan LazyFrame Creation

Overview

This implementation covers the concrete APIs for creating a LazyFrame from external data sources or from an existing eager DataFrame. These scan functions are the entry points to the Polars lazy query pipeline, producing a LazyFrame that contains a query plan node referencing the data source without reading any rows into memory.

APIs

pl.scan_csv(source) -> LazyFrame — Lazily scan a CSV file
pl.scan_parquet(source) -> LazyFrame — Lazily scan a Parquet file
pl.scan_ndjson(source) -> LazyFrame — Lazily scan a newline-delimited JSON file
pl.scan_ipc(source) -> LazyFrame — Lazily scan an IPC/Arrow file
DataFrame.lazy() -> LazyFrame — Convert an eager DataFrame to a LazyFrame

Source Reference

File: docs/source/src/python/user-guide/lazy/using.py (Lines 7-18)
Repository: Pola_rs_Polars

I/O Contract

Direction	Type	Description
Input	`str` (file path or URL)	Path to a CSV, Parquet, NDJSON, or IPC file on disk or accessible via URL
Input	`DataFrame`	An existing eager DataFrame (for `.lazy()`)
Output	`LazyFrame`	A lazy query plan reference that has not yet read any data

Key Parameters

Parameter	Type	Description
`source`	`str`	File path or URL to the data source
`try_parse_dates`	`bool`	Attempt to parse date/datetime columns automatically (CSV scanner)
`schema_overrides`	`dict`	Override inferred schema with explicit column types

Example Code

import polars as pl

# Scan CSV lazily
q = pl.scan_csv("data.csv")

# Scan Parquet lazily
q = pl.scan_parquet("data.parquet")

# Scan NDJSON lazily
q = pl.scan_ndjson("data.ndjson")

# Scan IPC lazily
q = pl.scan_ipc("data.ipc")

# Convert eager DataFrame to lazy
df = pl.DataFrame({"a": [1, 2, 3]})
q = df.lazy()

Scan with Schema Overrides

import polars as pl

q = pl.scan_csv(
    "data.csv",
    try_parse_dates=True,
    schema_overrides={"amount": pl.Float64, "id": pl.Utf8},
)

Import

import polars as pl

Behavior Notes

No data is read when a scan function is called. The returned LazyFrame is a lightweight object containing only the query plan node.
For Parquet and IPC files, schema information is read from file metadata without scanning row data.
For CSV and NDJSON files, a small sample of rows may be read to infer the schema unless schema_overrides is fully specified.
DataFrame.lazy() wraps an already-materialized DataFrame into a LazyFrame, enabling it to participate in the lazy query pipeline and benefit from optimizer passes on downstream operations.
All scan functions support both local file paths and URL sources (HTTP/HTTPS, S3, GCS, Azure depending on feature flags).

Related Pages

Metadata

Field	Value
Source Repository	Pola_rs_Polars
Source File	`docs/source/src/python/user-guide/lazy/using.py:L7-18`
Domain	Data Engineering, Lazy Evaluation, File I/O
Last Updated	2026-02-09 10:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment