Implementation:Pola rs Polars Read and Scan Operations

Knowledge Sources	polars Polars User Guide - IO
Domains	Data_Engineering, ETL, File_Format_Parsing
Last Updated	2026-02-09 10:00 GMT

Overview

Concrete read and scan functions for ingesting data from CSV, Parquet, JSON, NDJSON, IPC, Excel, and database sources into Polars DataFrames and LazyFrames.

Description

The Read and Scan Operations provide the primary data ingestion interface in Polars. The read_* family of functions performs eager loading, immediately parsing and materializing data into a DataFrame. The scan_* family creates a LazyFrame with a deferred query plan, enabling predicate and projection pushdown optimizations. Both families support local files, glob patterns, cloud URIs, and URLs.

Usage

Import polars and call the appropriate read or scan function for the target format. Use read_* for immediate access to small/medium datasets. Use scan_* followed by .collect() for large datasets or optimized pipelines. Pass storage_options or credential_provider when accessing cloud storage.

Code Reference

Source Location

Repository: polars
Files:
- docs/source/src/python/user-guide/io/csv.py (Lines: 1-19)
- docs/source/src/python/user-guide/io/parquet.py (Lines: 1-19)
- docs/source/src/python/user-guide/io/json.py (Lines: 1-27)

Signature

# Eager read functions
pl.read_csv(source: str, try_parse_dates: bool = False, ...) -> DataFrame
pl.read_parquet(source: str, ...) -> DataFrame
pl.read_json(source: str, ...) -> DataFrame
pl.read_ndjson(source: str, ...) -> DataFrame
pl.read_excel(source: str, sheet_name: str = None, ...) -> DataFrame
pl.read_database_uri(query: str, uri: str, ...) -> DataFrame

# Lazy scan functions
pl.scan_csv(source: str, ...) -> LazyFrame
pl.scan_parquet(source: str, hive_partitioning: bool = False, ...) -> LazyFrame
pl.scan_ndjson(source: str, ...) -> LazyFrame
pl.scan_ipc(source: str, ...) -> LazyFrame

Import

import polars as pl

I/O Contract

Inputs

Name	Type	Required	Description
source	str	Yes	File path, URL, glob pattern, or cloud storage URI (s3://, az://, gs://, hf://)
try_parse_dates	bool	No	Attempt to parse date columns automatically (CSV reader)
hive_partitioning	bool	No	Enable Hive-style partition discovery for partitioned datasets (Parquet scanner)
sheet_name	str	No	Name of the worksheet to read (Excel reader)
storage_options	dict	No	Cloud storage authentication credentials
credential_provider	CredentialProvider	No	Managed credential provider for cloud access
query	str	Yes (database)	SQL query string for database reads
uri	str	Yes (database)	Database connection URI for database reads

Outputs

Name	Type	Description
DataFrame	polars.DataFrame	Eagerly loaded tabular data (from read_* functions)
LazyFrame	polars.LazyFrame	Deferred query plan for lazy evaluation (from scan_* functions); call .collect() to materialize

Usage Examples

import polars as pl

# --- Eager reads ---
# Read a CSV file
df = pl.read_csv("data.csv")

# Read a Parquet file
df = pl.read_parquet("data.parquet")

# Read JSON
df = pl.read_json("data.json")

# Read NDJSON (newline-delimited JSON)
df = pl.read_ndjson("data.ndjson")

# --- Lazy scans ---
# Scan a CSV (creates LazyFrame, no data read yet)
lf = pl.scan_csv("data.csv")

# Scan Parquet files from S3 with glob
lf = pl.scan_parquet("s3://bucket/*.parquet")

# Scan Hive-partitioned Parquet dataset
lf = pl.scan_parquet("dataset/", hive_partitioning=True)

# Read from Hugging Face Hub
df = pl.read_parquet("hf://datasets/org/repo/data.parquet")

# Collect a lazy scan with filters (enables predicate pushdown)
df = (
    pl.scan_parquet("s3://bucket/large_dataset/*.parquet")
    .filter(pl.col("date") > "2025-01-01")
    .select("id", "date", "value")
    .collect()
)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment