Implementation:Pola rs Polars DataFrame Write Multi Format
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Serialization, Storage_Optimization |
| Last Updated | 2026-02-09 10:00 GMT |
Overview
Concrete APIs for writing Polars DataFrames to CSV, Parquet, JSON, Excel, IPC, and database targets, including partitioned writes and streaming sinks.
Description
The DataFrame Write Multi Format APIs serialize DataFrames to various output formats and destinations. Each write_* method on DataFrame handles eager serialization, while sink_* methods on LazyFrame provide streaming output for large datasets. Parquet writes support Hive-style partitioning for optimized downstream query performance.
Usage
Import polars and call the appropriate write method on a DataFrame after all transformations are complete. For database writes, install the ADBC driver for the target database. For streaming writes of large datasets, use LazyFrame sink_* methods instead of materializing with .collect() first.
Code Reference
Source Location
- Repository: polars
- Files:
- docs/source/src/python/user-guide/io/csv.py (Lines: 13-14)
- docs/source/src/python/user-guide/io/parquet.py (Lines: 13-14)
Signature
# CSV write
DataFrame.write_csv(file: str | Path | IOBase = None) -> str | None
# Parquet write (with optional partitioning)
DataFrame.write_parquet(
file: str | Path,
partition_by: list[str] = None,
compression: str = "zstd",
) -> None
# JSON write
DataFrame.write_json(file: str | Path | IOBase = None) -> str | None
# NDJSON write
DataFrame.write_ndjson(file: str | Path | IOBase = None) -> str | None
# Excel write
DataFrame.write_excel(
file: str | Path,
worksheet: str = None,
) -> Workbook
# IPC/Arrow write
DataFrame.write_ipc(file: str | Path) -> None
# Database write
DataFrame.write_database(
table_name: str,
connection: str,
engine: str = "adbc",
if_table_exists: str = "fail",
) -> int
# Streaming sinks (LazyFrame)
LazyFrame.sink_parquet(path: str | Path) -> DataFrame
LazyFrame.sink_ipc(path: str | Path) -> DataFrame
LazyFrame.sink_csv(path: str | Path) -> DataFrame
Import
import polars as pl
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| file | str or Path | Yes | Output file path for write operations |
| partition_by | list[str] | No | Column names for Hive-style partitioned writes (Parquet only) |
| compression | str | No | Compression codec: "zstd", "snappy", "lz4", "gzip", or "uncompressed" (Parquet) |
| worksheet | str | No | Name of the Excel worksheet to write to |
| table_name | str | Yes (database) | Target database table name |
| connection | str | Yes (database) | Database connection URI (e.g., "postgresql://user:pass@host/db") |
| engine | str | No | Database engine: "adbc" (default) for Arrow Database Connectivity |
| if_table_exists | str | No | Behavior when table exists: "fail" (default), "append", or "replace" |
Outputs
| Name | Type | Description |
|---|---|---|
| None | None | Most write methods return None on success (file written to disk) |
| str | str | write_csv and write_json return the serialized string if no file path is provided |
| Workbook | xlsxwriter.Workbook | write_excel returns the Workbook object for further customization |
| int | int | write_database returns the number of rows written |
| DataFrame | polars.DataFrame | sink_* methods return a DataFrame with metadata about the write operation |
Usage Examples
import polars as pl
df = pl.DataFrame({
"foo": [1, 2, 3],
"bar": ["a", "b", "c"],
"category": ["x", "x", "y"],
})
# Write to CSV
df.write_csv("output.csv")
# Write to Parquet with default Zstd compression
df.write_parquet("output.parquet")
# Hive-partitioned Parquet write
df.write_parquet("output/", partition_by=["category"])
# Creates: output/category=x/part-0.parquet
# output/category=y/part-0.parquet
# Write to JSON
df.write_json("output.json")
# Write to Excel with named worksheet
df.write_excel("output.xlsx", worksheet="Sheet1")
# Write to database via ADBC
df.write_database(
table_name="my_table",
connection="postgresql://user:pass@host/db",
engine="adbc",
)
# Streaming sink for large datasets (LazyFrame)
lf = pl.scan_csv("large_file.csv")
lf.filter(pl.col("value") > 0).sink_parquet("filtered_output.parquet")