Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Pola rs Polars SQL Collect and Output

From Leeroopedia


Overview

Concrete API usage for materializing SQL query results and persisting them to files. This implementation covers the collect() method for LazyFrame materialization, eager mode for immediate results, and write methods for exporting DataFrames to Parquet, CSV, and other formats.

Metadata

Field Value
Namespace Pola_rs_Polars
Workflow SQL_Query_Interface
Implementation_ID Pola_rs_Polars_SQL_Collect_and_Output
Type Implementation
Category Data Access / Query Interface
Stage Result Collection and Output
last_updated 2026-02-09 10:00 GMT
Source_Repository https://github.com/pola-rs/polars
Source_File docs/source/src/python/user-guide/sql/intro.py:L36-37
Documentation https://docs.pola.rs

API Signatures

Collect LazyFrame

LazyFrame.collect() -> DataFrame

Write to Parquet

DataFrame.write_parquet(file) -> None

Write to CSV

DataFrame.write_csv(file) -> None

Key Parameters

Parameter Type Description
file str or Path Output file path for write_parquet() and write_csv(). The file is created or overwritten at this location.

I/O Contract

Direction Type Description
Input LazyFrame Unevaluated query result from execute() in lazy mode
Output DataFrame Materialized result from .collect()
Output file (Parquet) Binary columnar file written via .write_parquet()
Output file (CSV) Text file written via .write_csv()

Implementation Examples

Lazy Execute Followed by Collect

import polars as pl

my_lf = pl.LazyFrame({
    "id": [1, 2, 3, 4, 5],
    "value": [50, 150, 200, 75, 300],
})

# Default lazy mode: execute returns LazyFrame
ctx = pl.SQLContext(my_data=my_lf)
result_lf = ctx.execute("SELECT * FROM my_data WHERE value > 100")

# Collect to materialize the query plan into a DataFrame
df = result_lf.collect()

Eager Mode for Immediate Results

import polars as pl

my_lf = pl.LazyFrame({
    "id": [1, 2, 3, 4, 5],
    "value": [50, 150, 200, 75, 300],
})

# Eager mode: execute returns DataFrame directly (no collect needed)
ctx_eager = pl.SQLContext(my_data=my_lf, eager=True)
df = ctx_eager.execute("SELECT * FROM my_data WHERE value > 100")

Write Results to Parquet

import polars as pl

my_lf = pl.LazyFrame({
    "id": [1, 2, 3, 4, 5],
    "value": [50, 150, 200, 75, 300],
})

ctx = pl.SQLContext(my_data=my_lf, eager=True)
df = ctx.execute("SELECT * FROM my_data WHERE value > 100")

# Write to Parquet format
df.write_parquet("sql_results.parquet")

Write Results to CSV

import polars as pl

my_lf = pl.LazyFrame({
    "id": [1, 2, 3, 4, 5],
    "value": [50, 150, 200, 75, 300],
})

ctx = pl.SQLContext(my_data=my_lf, eager=True)
df = ctx.execute("SELECT * FROM my_data WHERE value > 100")

# Write to CSV format
df.write_csv("sql_results.csv")

Hybrid SQL-Native Workflow (Lazy Chaining)

import polars as pl

my_lf = pl.LazyFrame({
    "id": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "B", "A"],
    "value": [50, 150, 200, 75, 300],
})

# Execute SQL in lazy mode
ctx = pl.SQLContext(my_data=my_lf)
result_lf = ctx.execute("SELECT * FROM my_data WHERE value > 100")

# Chain native Polars operations on the lazy result
final_lf = result_lf.with_columns(
    (pl.col("value") * 1.1).alias("adjusted_value")
).sort("value", descending=True)

# Collect the entire combined plan (SQL + native) at once
df = final_lf.collect()

Full Pipeline: SQL Query to File Output

import polars as pl

# Register source data
sales_lf = pl.LazyFrame({
    "product": ["Widget", "Gadget", "Widget", "Gadget", "Widget"],
    "quarter": ["Q1", "Q1", "Q2", "Q2", "Q3"],
    "revenue": [1000, 1500, 1200, 1800, 1100],
})

ctx = pl.SQLContext(sales=sales_lf, eager=True)

# Run aggregation query
summary = ctx.execute("""
    SELECT product, SUM(revenue) as total_revenue
    FROM sales
    GROUP BY product
    ORDER BY total_revenue DESC
""")

# Persist results in both formats
summary.write_parquet("sales_summary.parquet")
summary.write_csv("sales_summary.csv")

Behavioral Notes

  • collect() triggers the Polars query optimizer and executor. The entire logical plan (including any SQL-compiled operations and native Polars operations) is optimized as a single unit before execution.
  • write_parquet() writes a binary Parquet file with schema metadata. It supports optional compression settings (snappy, gzip, lz4, zstd) via additional parameters.
  • write_csv() writes a text CSV file. Column names are written as the header row by default. Delimiter, quote character, and other formatting options are available via additional parameters.
  • When using lazy mode with chained operations, the optimizer can apply cross-boundary optimizations (e.g., pushing a native filter into the SQL scan plan).
  • Eager mode is equivalent to calling execute() in lazy mode followed by an immediate collect(). The optimization and execution happen in the same call.

Relationships

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment