Implementation:Pola rs Polars SQL Collect and Output

Overview

Concrete API usage for materializing SQL query results and persisting them to files. This implementation covers the collect() method for LazyFrame materialization, eager mode for immediate results, and write methods for exporting DataFrames to Parquet, CSV, and other formats.

Metadata

Field	Value
Namespace	Pola_rs_Polars
Workflow	SQL_Query_Interface
Implementation_ID	Pola_rs_Polars_SQL_Collect_and_Output
Type	Implementation
Category	Data Access / Query Interface
Stage	Result Collection and Output
last_updated	2026-02-09 10:00 GMT
Source_Repository	https://github.com/pola-rs/polars
Source_File	docs/source/src/python/user-guide/sql/intro.py:L36-37
Documentation	https://docs.pola.rs

API Signatures

Collect LazyFrame

LazyFrame.collect() -> DataFrame

Write to Parquet

DataFrame.write_parquet(file) -> None

Write to CSV

DataFrame.write_csv(file) -> None

Key Parameters

Parameter	Type	Description
file	str or Path	Output file path for write_parquet() and write_csv(). The file is created or overwritten at this location.

I/O Contract

Direction	Type	Description
Input	LazyFrame	Unevaluated query result from execute() in lazy mode
Output	DataFrame	Materialized result from .collect()
Output	file (Parquet)	Binary columnar file written via .write_parquet()
Output	file (CSV)	Text file written via .write_csv()

Implementation Examples

Lazy Execute Followed by Collect

import polars as pl

my_lf = pl.LazyFrame({
    "id": [1, 2, 3, 4, 5],
    "value": [50, 150, 200, 75, 300],
})

# Default lazy mode: execute returns LazyFrame
ctx = pl.SQLContext(my_data=my_lf)
result_lf = ctx.execute("SELECT * FROM my_data WHERE value > 100")

# Collect to materialize the query plan into a DataFrame
df = result_lf.collect()

Eager Mode for Immediate Results

import polars as pl

my_lf = pl.LazyFrame({
    "id": [1, 2, 3, 4, 5],
    "value": [50, 150, 200, 75, 300],
})

# Eager mode: execute returns DataFrame directly (no collect needed)
ctx_eager = pl.SQLContext(my_data=my_lf, eager=True)
df = ctx_eager.execute("SELECT * FROM my_data WHERE value > 100")

Write Results to Parquet

import polars as pl

my_lf = pl.LazyFrame({
    "id": [1, 2, 3, 4, 5],
    "value": [50, 150, 200, 75, 300],
})

ctx = pl.SQLContext(my_data=my_lf, eager=True)
df = ctx.execute("SELECT * FROM my_data WHERE value > 100")

# Write to Parquet format
df.write_parquet("sql_results.parquet")

Write Results to CSV

import polars as pl

my_lf = pl.LazyFrame({
    "id": [1, 2, 3, 4, 5],
    "value": [50, 150, 200, 75, 300],
})

ctx = pl.SQLContext(my_data=my_lf, eager=True)
df = ctx.execute("SELECT * FROM my_data WHERE value > 100")

# Write to CSV format
df.write_csv("sql_results.csv")

Hybrid SQL-Native Workflow (Lazy Chaining)

import polars as pl

my_lf = pl.LazyFrame({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "B", "A"],
    "value": [50, 150, 200, 75, 300],
})

# Execute SQL in lazy mode
ctx = pl.SQLContext(my_data=my_lf)
result_lf = ctx.execute("SELECT * FROM my_data WHERE value > 100")

# Chain native Polars operations on the lazy result
final_lf = result_lf.with_columns(
    (pl.col("value") * 1.1).alias("adjusted_value")
).sort("value", descending=True)

# Collect the entire combined plan (SQL + native) at once
df = final_lf.collect()

Full Pipeline: SQL Query to File Output

import polars as pl

# Register source data
sales_lf = pl.LazyFrame({
    "product": ["Widget", "Gadget", "Widget", "Gadget", "Widget"],
    "quarter": ["Q1", "Q1", "Q2", "Q2", "Q3"],
    "revenue": [1000, 1500, 1200, 1800, 1100],
})

ctx = pl.SQLContext(sales=sales_lf, eager=True)

# Run aggregation query
summary = ctx.execute("""
    SELECT product, SUM(revenue) as total_revenue
    FROM sales
    GROUP BY product
    ORDER BY total_revenue DESC
""")

# Persist results in both formats
summary.write_parquet("sales_summary.parquet")
summary.write_csv("sales_summary.csv")

Behavioral Notes

collect() triggers the Polars query optimizer and executor. The entire logical plan (including any SQL-compiled operations and native Polars operations) is optimized as a single unit before execution.
write_parquet() writes a binary Parquet file with schema metadata. It supports optional compression settings (snappy, gzip, lz4, zstd) via additional parameters.
write_csv() writes a text CSV file. Column names are written as the header row by default. Delimiter, quote character, and other formatting options are available via additional parameters.
When using lazy mode with chained operations, the optimizer can apply cross-boundary optimizations (e.g., pushing a native filter into the SQL scan plan).
Eager mode is equivalent to calling execute() in lazy mode followed by an immediate collect(). The optimization and execution happen in the same call.

Relationships

Principle:Pola_rs_Polars_SQL_Result_Collection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment