Implementation:Pola rs Polars SQL Collect and Output
Appearance
Overview
Concrete API usage for materializing SQL query results and persisting them to files. This implementation covers the collect() method for LazyFrame materialization, eager mode for immediate results, and write methods for exporting DataFrames to Parquet, CSV, and other formats.
Metadata
| Field | Value |
|---|---|
| Namespace | Pola_rs_Polars |
| Workflow | SQL_Query_Interface |
| Implementation_ID | Pola_rs_Polars_SQL_Collect_and_Output |
| Type | Implementation |
| Category | Data Access / Query Interface |
| Stage | Result Collection and Output |
| last_updated | 2026-02-09 10:00 GMT |
| Source_Repository | https://github.com/pola-rs/polars |
| Source_File | docs/source/src/python/user-guide/sql/intro.py:L36-37 |
| Documentation | https://docs.pola.rs |
API Signatures
Collect LazyFrame
LazyFrame.collect() -> DataFrame
Write to Parquet
DataFrame.write_parquet(file) -> None
Write to CSV
DataFrame.write_csv(file) -> None
Key Parameters
| Parameter | Type | Description |
|---|---|---|
| file | str or Path | Output file path for write_parquet() and write_csv(). The file is created or overwritten at this location. |
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | LazyFrame | Unevaluated query result from execute() in lazy mode |
| Output | DataFrame | Materialized result from .collect() |
| Output | file (Parquet) | Binary columnar file written via .write_parquet() |
| Output | file (CSV) | Text file written via .write_csv() |
Implementation Examples
Lazy Execute Followed by Collect
import polars as pl
my_lf = pl.LazyFrame({
"id": [1, 2, 3, 4, 5],
"value": [50, 150, 200, 75, 300],
})
# Default lazy mode: execute returns LazyFrame
ctx = pl.SQLContext(my_data=my_lf)
result_lf = ctx.execute("SELECT * FROM my_data WHERE value > 100")
# Collect to materialize the query plan into a DataFrame
df = result_lf.collect()
Eager Mode for Immediate Results
import polars as pl
my_lf = pl.LazyFrame({
"id": [1, 2, 3, 4, 5],
"value": [50, 150, 200, 75, 300],
})
# Eager mode: execute returns DataFrame directly (no collect needed)
ctx_eager = pl.SQLContext(my_data=my_lf, eager=True)
df = ctx_eager.execute("SELECT * FROM my_data WHERE value > 100")
Write Results to Parquet
import polars as pl
my_lf = pl.LazyFrame({
"id": [1, 2, 3, 4, 5],
"value": [50, 150, 200, 75, 300],
})
ctx = pl.SQLContext(my_data=my_lf, eager=True)
df = ctx.execute("SELECT * FROM my_data WHERE value > 100")
# Write to Parquet format
df.write_parquet("sql_results.parquet")
Write Results to CSV
import polars as pl
my_lf = pl.LazyFrame({
"id": [1, 2, 3, 4, 5],
"value": [50, 150, 200, 75, 300],
})
ctx = pl.SQLContext(my_data=my_lf, eager=True)
df = ctx.execute("SELECT * FROM my_data WHERE value > 100")
# Write to CSV format
df.write_csv("sql_results.csv")
Hybrid SQL-Native Workflow (Lazy Chaining)
import polars as pl
my_lf = pl.LazyFrame({
"id": [1, 2, 3, 4, 5],
"category": ["A", "B", "A", "B", "A"],
"value": [50, 150, 200, 75, 300],
})
# Execute SQL in lazy mode
ctx = pl.SQLContext(my_data=my_lf)
result_lf = ctx.execute("SELECT * FROM my_data WHERE value > 100")
# Chain native Polars operations on the lazy result
final_lf = result_lf.with_columns(
(pl.col("value") * 1.1).alias("adjusted_value")
).sort("value", descending=True)
# Collect the entire combined plan (SQL + native) at once
df = final_lf.collect()
Full Pipeline: SQL Query to File Output
import polars as pl
# Register source data
sales_lf = pl.LazyFrame({
"product": ["Widget", "Gadget", "Widget", "Gadget", "Widget"],
"quarter": ["Q1", "Q1", "Q2", "Q2", "Q3"],
"revenue": [1000, 1500, 1200, 1800, 1100],
})
ctx = pl.SQLContext(sales=sales_lf, eager=True)
# Run aggregation query
summary = ctx.execute("""
SELECT product, SUM(revenue) as total_revenue
FROM sales
GROUP BY product
ORDER BY total_revenue DESC
""")
# Persist results in both formats
summary.write_parquet("sales_summary.parquet")
summary.write_csv("sales_summary.csv")
Behavioral Notes
- collect() triggers the Polars query optimizer and executor. The entire logical plan (including any SQL-compiled operations and native Polars operations) is optimized as a single unit before execution.
- write_parquet() writes a binary Parquet file with schema metadata. It supports optional compression settings (snappy, gzip, lz4, zstd) via additional parameters.
- write_csv() writes a text CSV file. Column names are written as the header row by default. Delimiter, quote character, and other formatting options are available via additional parameters.
- When using lazy mode with chained operations, the optimizer can apply cross-boundary optimizations (e.g., pushing a native filter into the SQL scan plan).
- Eager mode is equivalent to calling execute() in lazy mode followed by an immediate collect(). The optimization and execution happen in the same call.
Relationships
See Also
- Implementation:Pola_rs_Polars_SQLContext_Constructor — Creating the SQLContext
- Implementation:Pola_rs_Polars_SQLContext_Register — Registering data sources
- Implementation:Pola_rs_Polars_SQLContext_Execute — Executing SQL queries
- Implementation:Pola_rs_Polars_SQLContext_CTE_and_DDL — Advanced SQL features
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment