Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Pola rs Polars Scan LazyFrame Creation

From Leeroopedia


Overview

This implementation covers the concrete APIs for creating a LazyFrame from external data sources or from an existing eager DataFrame. These scan functions are the entry points to the Polars lazy query pipeline, producing a LazyFrame that contains a query plan node referencing the data source without reading any rows into memory.

APIs

  • pl.scan_csv(source) -> LazyFrame — Lazily scan a CSV file
  • pl.scan_parquet(source) -> LazyFrame — Lazily scan a Parquet file
  • pl.scan_ndjson(source) -> LazyFrame — Lazily scan a newline-delimited JSON file
  • pl.scan_ipc(source) -> LazyFrame — Lazily scan an IPC/Arrow file
  • DataFrame.lazy() -> LazyFrame — Convert an eager DataFrame to a LazyFrame

Source Reference

  • File: docs/source/src/python/user-guide/lazy/using.py (Lines 7-18)
  • Repository: Pola_rs_Polars

I/O Contract

Direction Type Description
Input str (file path or URL) Path to a CSV, Parquet, NDJSON, or IPC file on disk or accessible via URL
Input DataFrame An existing eager DataFrame (for .lazy())
Output LazyFrame A lazy query plan reference that has not yet read any data

Key Parameters

Parameter Type Description
source str File path or URL to the data source
try_parse_dates bool Attempt to parse date/datetime columns automatically (CSV scanner)
schema_overrides dict Override inferred schema with explicit column types

Example Code

import polars as pl

# Scan CSV lazily
q = pl.scan_csv("data.csv")

# Scan Parquet lazily
q = pl.scan_parquet("data.parquet")

# Scan NDJSON lazily
q = pl.scan_ndjson("data.ndjson")

# Scan IPC lazily
q = pl.scan_ipc("data.ipc")

# Convert eager DataFrame to lazy
df = pl.DataFrame({"a": [1, 2, 3]})
q = df.lazy()

Scan with Schema Overrides

import polars as pl

q = pl.scan_csv(
    "data.csv",
    try_parse_dates=True,
    schema_overrides={"amount": pl.Float64, "id": pl.Utf8},
)

Import

import polars as pl

Behavior Notes

  • No data is read when a scan function is called. The returned LazyFrame is a lightweight object containing only the query plan node.
  • For Parquet and IPC files, schema information is read from file metadata without scanning row data.
  • For CSV and NDJSON files, a small sample of rows may be read to infer the schema unless schema_overrides is fully specified.
  • DataFrame.lazy() wraps an already-materialized DataFrame into a LazyFrame, enabling it to participate in the lazy query pipeline and benefit from optimizer passes on downstream operations.
  • All scan functions support both local file paths and URL sources (HTTP/HTTPS, S3, GCS, Azure depending on feature flags).

Related Pages

Metadata

Field Value
Source Repository Pola_rs_Polars
Source File docs/source/src/python/user-guide/lazy/using.py:L7-18
Domain Data Engineering, Lazy Evaluation, File I/O
Last Updated 2026-02-09 10:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment