Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Pola rs Polars DataFrame Write Multi Format

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Data_Serialization, Storage_Optimization
Last Updated 2026-02-09 10:00 GMT

Overview

Concrete APIs for writing Polars DataFrames to CSV, Parquet, JSON, Excel, IPC, and database targets, including partitioned writes and streaming sinks.

Description

The DataFrame Write Multi Format APIs serialize DataFrames to various output formats and destinations. Each write_* method on DataFrame handles eager serialization, while sink_* methods on LazyFrame provide streaming output for large datasets. Parquet writes support Hive-style partitioning for optimized downstream query performance.

Usage

Import polars and call the appropriate write method on a DataFrame after all transformations are complete. For database writes, install the ADBC driver for the target database. For streaming writes of large datasets, use LazyFrame sink_* methods instead of materializing with .collect() first.

Code Reference

Source Location

  • Repository: polars
  • Files:
    • docs/source/src/python/user-guide/io/csv.py (Lines: 13-14)
    • docs/source/src/python/user-guide/io/parquet.py (Lines: 13-14)

Signature

# CSV write
DataFrame.write_csv(file: str | Path | IOBase = None) -> str | None

# Parquet write (with optional partitioning)
DataFrame.write_parquet(
    file: str | Path,
    partition_by: list[str] = None,
    compression: str = "zstd",
) -> None

# JSON write
DataFrame.write_json(file: str | Path | IOBase = None) -> str | None

# NDJSON write
DataFrame.write_ndjson(file: str | Path | IOBase = None) -> str | None

# Excel write
DataFrame.write_excel(
    file: str | Path,
    worksheet: str = None,
) -> Workbook

# IPC/Arrow write
DataFrame.write_ipc(file: str | Path) -> None

# Database write
DataFrame.write_database(
    table_name: str,
    connection: str,
    engine: str = "adbc",
    if_table_exists: str = "fail",
) -> int

# Streaming sinks (LazyFrame)
LazyFrame.sink_parquet(path: str | Path) -> DataFrame
LazyFrame.sink_ipc(path: str | Path) -> DataFrame
LazyFrame.sink_csv(path: str | Path) -> DataFrame

Import

import polars as pl

I/O Contract

Inputs

Name Type Required Description
file str or Path Yes Output file path for write operations
partition_by list[str] No Column names for Hive-style partitioned writes (Parquet only)
compression str No Compression codec: "zstd", "snappy", "lz4", "gzip", or "uncompressed" (Parquet)
worksheet str No Name of the Excel worksheet to write to
table_name str Yes (database) Target database table name
connection str Yes (database) Database connection URI (e.g., "postgresql://user:pass@host/db")
engine str No Database engine: "adbc" (default) for Arrow Database Connectivity
if_table_exists str No Behavior when table exists: "fail" (default), "append", or "replace"

Outputs

Name Type Description
None None Most write methods return None on success (file written to disk)
str str write_csv and write_json return the serialized string if no file path is provided
Workbook xlsxwriter.Workbook write_excel returns the Workbook object for further customization
int int write_database returns the number of rows written
DataFrame polars.DataFrame sink_* methods return a DataFrame with metadata about the write operation

Usage Examples

import polars as pl

df = pl.DataFrame({
    "foo": [1, 2, 3],
    "bar": ["a", "b", "c"],
    "category": ["x", "x", "y"],
})

# Write to CSV
df.write_csv("output.csv")

# Write to Parquet with default Zstd compression
df.write_parquet("output.parquet")

# Hive-partitioned Parquet write
df.write_parquet("output/", partition_by=["category"])
# Creates: output/category=x/part-0.parquet
#          output/category=y/part-0.parquet

# Write to JSON
df.write_json("output.json")

# Write to Excel with named worksheet
df.write_excel("output.xlsx", worksheet="Sheet1")

# Write to database via ADBC
df.write_database(
    table_name="my_table",
    connection="postgresql://user:pass@host/db",
    engine="adbc",
)

# Streaming sink for large datasets (LazyFrame)
lf = pl.scan_csv("large_file.csv")
lf.filter(pl.col("value") > 0).sink_parquet("filtered_output.parquet")

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment