Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Pola rs Polars Read and Scan Operations

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, ETL, File_Format_Parsing
Last Updated 2026-02-09 10:00 GMT

Overview

Concrete read and scan functions for ingesting data from CSV, Parquet, JSON, NDJSON, IPC, Excel, and database sources into Polars DataFrames and LazyFrames.

Description

The Read and Scan Operations provide the primary data ingestion interface in Polars. The read_* family of functions performs eager loading, immediately parsing and materializing data into a DataFrame. The scan_* family creates a LazyFrame with a deferred query plan, enabling predicate and projection pushdown optimizations. Both families support local files, glob patterns, cloud URIs, and URLs.

Usage

Import polars and call the appropriate read or scan function for the target format. Use read_* for immediate access to small/medium datasets. Use scan_* followed by .collect() for large datasets or optimized pipelines. Pass storage_options or credential_provider when accessing cloud storage.

Code Reference

Source Location

  • Repository: polars
  • Files:
    • docs/source/src/python/user-guide/io/csv.py (Lines: 1-19)
    • docs/source/src/python/user-guide/io/parquet.py (Lines: 1-19)
    • docs/source/src/python/user-guide/io/json.py (Lines: 1-27)

Signature

# Eager read functions
pl.read_csv(source: str, try_parse_dates: bool = False, ...) -> DataFrame
pl.read_parquet(source: str, ...) -> DataFrame
pl.read_json(source: str, ...) -> DataFrame
pl.read_ndjson(source: str, ...) -> DataFrame
pl.read_excel(source: str, sheet_name: str = None, ...) -> DataFrame
pl.read_database_uri(query: str, uri: str, ...) -> DataFrame

# Lazy scan functions
pl.scan_csv(source: str, ...) -> LazyFrame
pl.scan_parquet(source: str, hive_partitioning: bool = False, ...) -> LazyFrame
pl.scan_ndjson(source: str, ...) -> LazyFrame
pl.scan_ipc(source: str, ...) -> LazyFrame

Import

import polars as pl

I/O Contract

Inputs

Name Type Required Description
source str Yes File path, URL, glob pattern, or cloud storage URI (s3://, az://, gs://, hf://)
try_parse_dates bool No Attempt to parse date columns automatically (CSV reader)
hive_partitioning bool No Enable Hive-style partition discovery for partitioned datasets (Parquet scanner)
sheet_name str No Name of the worksheet to read (Excel reader)
storage_options dict No Cloud storage authentication credentials
credential_provider CredentialProvider No Managed credential provider for cloud access
query str Yes (database) SQL query string for database reads
uri str Yes (database) Database connection URI for database reads

Outputs

Name Type Description
DataFrame polars.DataFrame Eagerly loaded tabular data (from read_* functions)
LazyFrame polars.LazyFrame Deferred query plan for lazy evaluation (from scan_* functions); call .collect() to materialize

Usage Examples

import polars as pl

# --- Eager reads ---
# Read a CSV file
df = pl.read_csv("data.csv")

# Read a Parquet file
df = pl.read_parquet("data.parquet")

# Read JSON
df = pl.read_json("data.json")

# Read NDJSON (newline-delimited JSON)
df = pl.read_ndjson("data.ndjson")

# --- Lazy scans ---
# Scan a CSV (creates LazyFrame, no data read yet)
lf = pl.scan_csv("data.csv")

# Scan Parquet files from S3 with glob
lf = pl.scan_parquet("s3://bucket/*.parquet")

# Scan Hive-partitioned Parquet dataset
lf = pl.scan_parquet("dataset/", hive_partitioning=True)

# Read from Hugging Face Hub
df = pl.read_parquet("hf://datasets/org/repo/data.parquet")

# Collect a lazy scan with filters (enables predicate pushdown)
df = (
    pl.scan_parquet("s3://bucket/large_dataset/*.parquet")
    .filter(pl.col("date") > "2025-01-01")
    .select("id", "date", "value")
    .collect()
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment