Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Pola rs Polars DataFrame Write and Convert

From Leeroopedia


Overview

This implementation covers the concrete APIs for writing a DataFrame to persistent storage formats and converting it to interoperable data structures. These are the final-stage operations in the Polars lazy query pipeline, taking materialized results and outputting them as files on disk or as objects consumable by other Python libraries.

APIs

  • DataFrame.write_parquet(file) -> None — Write DataFrame to a Parquet file
  • DataFrame.write_csv(file) -> None — Write DataFrame to a CSV file
  • DataFrame.write_json(file) -> None — Write DataFrame to a JSON file
  • DataFrame.write_ndjson(file) -> None — Write DataFrame to a newline-delimited JSON file
  • DataFrame.write_ipc(file) -> None — Write DataFrame to an IPC/Arrow file
  • DataFrame.to_pandas() -> pd.DataFrame — Convert to a pandas DataFrame
  • DataFrame.to_arrow() -> pa.Table — Convert to a PyArrow Table

Source Reference

  • File: docs/source/src/python/user-guide/io/csv.py (Lines 13-14)
  • File: docs/source/src/python/user-guide/io/parquet.py (Lines 13-14)
  • Repository: Pola_rs_Polars

I/O Contract

Direction Type Description
Input DataFrame A materialized in-memory DataFrame (result of collect() or eager construction)
Output (write_*) File on disk A file in the specified format written to the given path; returns None
Output (to_pandas) pd.DataFrame A pandas DataFrame containing the same data
Output (to_arrow) pa.Table A PyArrow Table sharing the underlying Arrow memory buffers

Key Parameters

Parameter Type Description
file str or Path Output file path for write operations
compression (Parquet) str Compression codec: "snappy" (default), "gzip", "lz4", "zstd", "uncompressed"
separator (CSV) str Field separator character (default ",")
include_header (CSV) bool Whether to write column names as the first row (default True)
use_pyarrow_extension_array (to_pandas) bool Use Arrow-backed pandas extension arrays for more efficient conversion

Example Code

Write to File Formats

import polars as pl

df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})

# Write to Parquet (columnar, compressed)
df.write_parquet("output.parquet")

# Write to CSV (row-based, human-readable)
df.write_csv("output.csv")

# Write to JSON
df.write_json("output.json")

# Write to NDJSON (one JSON object per line)
df.write_ndjson("output.ndjson")

# Write to IPC/Arrow (columnar, fast inter-process exchange)
df.write_ipc("output.ipc")

Convert to Other Libraries

import polars as pl

df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})

# Convert to pandas DataFrame
pandas_df = df.to_pandas()
print(type(pandas_df))  # <class 'pandas.core.frame.DataFrame'>

# Convert to PyArrow Table (zero-copy)
arrow_table = df.to_arrow()
print(type(arrow_table))  # <class 'pyarrow.lib.Table'>

End-to-End Pipeline: Scan, Transform, Write

import polars as pl

# Full lazy pipeline from scan to output
(
    pl.scan_csv("raw_data.csv")
    .filter(pl.col("status") == "active")
    .group_by("region")
    .agg(
        pl.col("revenue").sum().alias("total_revenue"),
        pl.len().alias("count"),
    )
    .sort("total_revenue", descending=True)
    .collect()
    .write_parquet("summary.parquet")
)

Parquet with Compression Options

import polars as pl

df = pl.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]})

# Write with zstd compression for better compression ratio
df.write_parquet("output_zstd.parquet", compression="zstd")

# Write uncompressed for maximum read speed
df.write_parquet("output_raw.parquet", compression="uncompressed")

Import

import polars as pl

Behavior Notes

  • write_parquet() uses Snappy compression by default: This provides a good balance between compression ratio and read/write speed. Alternative codecs (gzip, lz4, zstd) are available via the compression parameter.
  • to_arrow() is zero-copy: Because Polars uses Apache Arrow as its internal memory format, converting to a PyArrow Table shares the underlying buffers without copying data.
  • to_pandas() may copy data: Depending on the data types involved, conversion to pandas may require copying data into pandas' native representation. Using use_pyarrow_extension_array=True can reduce copying by using Arrow-backed pandas columns.
  • write_csv() produces text output: Numeric precision may be affected by string conversion. For lossless round-tripping, prefer Parquet or IPC.
  • File paths are overwritten: All write_* methods overwrite existing files at the specified path without warning.
  • None return value: All write_* methods return None. The operation's success is indicated by the absence of an exception.

Related Pages

Metadata

Field Value
Source Repository Pola_rs_Polars
Source File docs/source/src/python/user-guide/io/csv.py:L13-14, docs/source/src/python/user-guide/io/parquet.py:L13-14
Domain Data Engineering, Data Serialization, Interoperability
Last Updated 2026-02-09 10:00 GMT

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment