Implementation:Pola rs Polars DataFrame Write and Convert

Overview

This implementation covers the concrete APIs for writing a DataFrame to persistent storage formats and converting it to interoperable data structures. These are the final-stage operations in the Polars lazy query pipeline, taking materialized results and outputting them as files on disk or as objects consumable by other Python libraries.

APIs

DataFrame.write_parquet(file) -> None — Write DataFrame to a Parquet file
DataFrame.write_csv(file) -> None — Write DataFrame to a CSV file
DataFrame.write_json(file) -> None — Write DataFrame to a JSON file
DataFrame.write_ndjson(file) -> None — Write DataFrame to a newline-delimited JSON file
DataFrame.write_ipc(file) -> None — Write DataFrame to an IPC/Arrow file
DataFrame.to_pandas() -> pd.DataFrame — Convert to a pandas DataFrame
DataFrame.to_arrow() -> pa.Table — Convert to a PyArrow Table

Source Reference

File: docs/source/src/python/user-guide/io/csv.py (Lines 13-14)
File: docs/source/src/python/user-guide/io/parquet.py (Lines 13-14)
Repository: Pola_rs_Polars

I/O Contract

Direction	Type	Description
Input	`DataFrame`	A materialized in-memory DataFrame (result of `collect()` or eager construction)
Output (`write_*`)	File on disk	A file in the specified format written to the given path; returns `None`
Output (`to_pandas`)	`pd.DataFrame`	A pandas DataFrame containing the same data
Output (`to_arrow`)	`pa.Table`	A PyArrow Table sharing the underlying Arrow memory buffers

Key Parameters

Parameter	Type	Description
`file`	`str` or `Path`	Output file path for write operations
`compression` (Parquet)	`str`	Compression codec: "snappy" (default), "gzip", "lz4", "zstd", "uncompressed"
`separator` (CSV)	`str`	Field separator character (default `","`)
`include_header` (CSV)	`bool`	Whether to write column names as the first row (default `True`)
`use_pyarrow_extension_array` (to_pandas)	`bool`	Use Arrow-backed pandas extension arrays for more efficient conversion

Example Code

Write to File Formats

import polars as pl

df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})

# Write to Parquet (columnar, compressed)
df.write_parquet("output.parquet")

# Write to CSV (row-based, human-readable)
df.write_csv("output.csv")

# Write to JSON
df.write_json("output.json")

# Write to NDJSON (one JSON object per line)
df.write_ndjson("output.ndjson")

# Write to IPC/Arrow (columnar, fast inter-process exchange)
df.write_ipc("output.ipc")

Convert to Other Libraries

import polars as pl

df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})

# Convert to pandas DataFrame
pandas_df = df.to_pandas()
print(type(pandas_df))  # <class 'pandas.core.frame.DataFrame'>

# Convert to PyArrow Table (zero-copy)
arrow_table = df.to_arrow()
print(type(arrow_table))  # <class 'pyarrow.lib.Table'>

End-to-End Pipeline: Scan, Transform, Write

import polars as pl

# Full lazy pipeline from scan to output
(
    pl.scan_csv("raw_data.csv")
    .filter(pl.col("status") == "active")
    .group_by("region")
    .agg(
        pl.col("revenue").sum().alias("total_revenue"),
        pl.len().alias("count"),
    )
    .sort("total_revenue", descending=True)
    .collect()
    .write_parquet("summary.parquet")
)

Parquet with Compression Options

import polars as pl

df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})

# Write with zstd compression for better compression ratio
df.write_parquet("output_zstd.parquet", compression="zstd")

# Write uncompressed for maximum read speed
df.write_parquet("output_raw.parquet", compression="uncompressed")

Import

import polars as pl

Behavior Notes

write_parquet() uses Snappy compression by default: This provides a good balance between compression ratio and read/write speed. Alternative codecs (gzip, lz4, zstd) are available via the compression parameter.
to_arrow() is zero-copy: Because Polars uses Apache Arrow as its internal memory format, converting to a PyArrow Table shares the underlying buffers without copying data.
to_pandas() may copy data: Depending on the data types involved, conversion to pandas may require copying data into pandas' native representation. Using use_pyarrow_extension_array=True can reduce copying by using Arrow-backed pandas columns.
write_csv() produces text output: Numeric precision may be affected by string conversion. For lossless round-tripping, prefer Parquet or IPC.
File paths are overwritten: All write_* methods overwrite existing files at the specified path without warning.
None return value: All write_* methods return None. The operation's success is indicated by the absence of an exception.

Related Pages

Metadata

Field	Value
Source Repository	Pola_rs_Polars
Source File	`docs/source/src/python/user-guide/io/csv.py:L13-14`, `docs/source/src/python/user-guide/io/parquet.py:L13-14`
Domain	Data Engineering, Data Serialization, Interoperability
Last Updated	2026-02-09 10:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment