Implementation:Pola rs Polars DataFrame Write and Convert
Appearance
Overview
This implementation covers the concrete APIs for writing a DataFrame to persistent storage formats and converting it to interoperable data structures. These are the final-stage operations in the Polars lazy query pipeline, taking materialized results and outputting them as files on disk or as objects consumable by other Python libraries.
APIs
DataFrame.write_parquet(file) -> None— Write DataFrame to a Parquet fileDataFrame.write_csv(file) -> None— Write DataFrame to a CSV fileDataFrame.write_json(file) -> None— Write DataFrame to a JSON fileDataFrame.write_ndjson(file) -> None— Write DataFrame to a newline-delimited JSON fileDataFrame.write_ipc(file) -> None— Write DataFrame to an IPC/Arrow fileDataFrame.to_pandas() -> pd.DataFrame— Convert to a pandas DataFrameDataFrame.to_arrow() -> pa.Table— Convert to a PyArrow Table
Source Reference
- File:
docs/source/src/python/user-guide/io/csv.py(Lines 13-14) - File:
docs/source/src/python/user-guide/io/parquet.py(Lines 13-14) - Repository: Pola_rs_Polars
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | DataFrame |
A materialized in-memory DataFrame (result of collect() or eager construction)
|
Output (write_*) |
File on disk | A file in the specified format written to the given path; returns None
|
Output (to_pandas) |
pd.DataFrame |
A pandas DataFrame containing the same data |
Output (to_arrow) |
pa.Table |
A PyArrow Table sharing the underlying Arrow memory buffers |
Key Parameters
| Parameter | Type | Description |
|---|---|---|
file |
str or Path |
Output file path for write operations |
compression (Parquet) |
str |
Compression codec: "snappy" (default), "gzip", "lz4", "zstd", "uncompressed" |
separator (CSV) |
str |
Field separator character (default ",")
|
include_header (CSV) |
bool |
Whether to write column names as the first row (default True)
|
use_pyarrow_extension_array (to_pandas) |
bool |
Use Arrow-backed pandas extension arrays for more efficient conversion |
Example Code
Write to File Formats
import polars as pl
df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})
# Write to Parquet (columnar, compressed)
df.write_parquet("output.parquet")
# Write to CSV (row-based, human-readable)
df.write_csv("output.csv")
# Write to JSON
df.write_json("output.json")
# Write to NDJSON (one JSON object per line)
df.write_ndjson("output.ndjson")
# Write to IPC/Arrow (columnar, fast inter-process exchange)
df.write_ipc("output.ipc")
Convert to Other Libraries
import polars as pl
df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})
# Convert to pandas DataFrame
pandas_df = df.to_pandas()
print(type(pandas_df)) # <class 'pandas.core.frame.DataFrame'>
# Convert to PyArrow Table (zero-copy)
arrow_table = df.to_arrow()
print(type(arrow_table)) # <class 'pyarrow.lib.Table'>
End-to-End Pipeline: Scan, Transform, Write
import polars as pl
# Full lazy pipeline from scan to output
(
pl.scan_csv("raw_data.csv")
.filter(pl.col("status") == "active")
.group_by("region")
.agg(
pl.col("revenue").sum().alias("total_revenue"),
pl.len().alias("count"),
)
.sort("total_revenue", descending=True)
.collect()
.write_parquet("summary.parquet")
)
Parquet with Compression Options
import polars as pl
df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})
# Write with zstd compression for better compression ratio
df.write_parquet("output_zstd.parquet", compression="zstd")
# Write uncompressed for maximum read speed
df.write_parquet("output_raw.parquet", compression="uncompressed")
Import
import polars as pl
Behavior Notes
- write_parquet() uses Snappy compression by default: This provides a good balance between compression ratio and read/write speed. Alternative codecs (gzip, lz4, zstd) are available via the
compressionparameter. - to_arrow() is zero-copy: Because Polars uses Apache Arrow as its internal memory format, converting to a PyArrow Table shares the underlying buffers without copying data.
- to_pandas() may copy data: Depending on the data types involved, conversion to pandas may require copying data into pandas' native representation. Using
use_pyarrow_extension_array=Truecan reduce copying by using Arrow-backed pandas columns. - write_csv() produces text output: Numeric precision may be affected by string conversion. For lossless round-tripping, prefer Parquet or IPC.
- File paths are overwritten: All
write_*methods overwrite existing files at the specified path without warning. - None return value: All
write_*methods returnNone. The operation's success is indicated by the absence of an exception.
Related Pages
- Principle:Pola_rs_Polars_DataFrame_Output_Conversion
- Implementation:Pola_rs_Polars_LazyFrame_Collect
- Implementation:Pola_rs_Polars_Scan_LazyFrame_Creation
Metadata
| Field | Value |
|---|---|
| Source Repository | Pola_rs_Polars |
| Source File | docs/source/src/python/user-guide/io/csv.py:L13-14, docs/source/src/python/user-guide/io/parquet.py:L13-14
|
| Domain | Data Engineering, Data Serialization, Interoperability |
| Last Updated | 2026-02-09 10:00 GMT |
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment