Implementation:Eventual Inc Daft DataFrame Write Parquet
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Storage |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for writing DataFrame contents to Apache Parquet columnar files provided by the Daft library.
Description
The write_parquet method on Daft's DataFrame class writes the DataFrame as Parquet files to a specified root directory. Files are written with randomly generated UUID filenames. It supports configurable compression codecs (snappy, gzip, zstd, lz4), three write modes (append, overwrite, overwrite-partitions), and Hive-style partitioning. The method is a blocking call that triggers full execution of the DataFrame query plan. It returns a new DataFrame containing the paths of the written files.
Usage
Use df.write_parquet() to persist DataFrame results to Parquet files. This is a method on DataFrame instances. An optional IOConfig can be provided for remote storage credentials.
Code Reference
Source Location
- Repository: Daft
- File:
daft/dataframe/dataframe.py - Lines: L786-851
Signature
def write_parquet(
self,
root_dir: str | pathlib.Path,
compression: str = "snappy",
write_mode: Literal["append", "overwrite", "overwrite-partitions"] = "append",
partition_cols: list[ColumnInputType] | None = None,
io_config: IOConfig | None = None,
) -> DataFrame
Import
import daft
# Method on DataFrame - no separate import needed
df.write_parquet("output/")
df.write_parquet("s3://bucket/path/", write_mode="overwrite")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| root_dir | str or pathlib.Path | Yes | Root file path to write Parquet files to |
| compression | str | No | Compression algorithm. Defaults to "snappy". Options: "snappy", "gzip", "zstd", "lz4". |
| write_mode | Literal["append", "overwrite", "overwrite-partitions"] | No | Operation mode. "append" adds new data, "overwrite" replaces all data, "overwrite-partitions" replaces only affected partitions. Defaults to "append". |
| partition_cols | list[ColumnInputType] or None | No | Columns to use for Hive-style directory partitioning. Required for "overwrite-partitions" mode. |
| io_config | IOConfig or None | No | Configuration for remote storage (e.g., S3, GCS credentials). Defaults to context configuration. |
Outputs
| Name | Type | Description |
|---|---|---|
| return | DataFrame | A new DataFrame containing the file paths of the written Parquet files |
Usage Examples
Basic Usage
import daft
df = daft.from_pydict({"x": [1, 2, 3], "y": ["a", "b", "c"]})
# Write to local directory
result = df.write_parquet("output_dir", write_mode="overwrite")
result.show()
# Shows paths of written files
Partitioned Write
import daft
df = daft.from_pydict({
"date": ["2024-01", "2024-01", "2024-02"],
"value": [1, 2, 3]
})
# Write with Hive-style partitioning by date
result = df.write_parquet(
"output_dir",
partition_cols=["date"],
write_mode="overwrite"
)
# Creates: output_dir/date=2024-01/..., output_dir/date=2024-02/...
Related Pages
Implements Principle
Requires Environment
- Environment:Eventual_Inc_Daft_Python_PyArrow_Core
- Environment:Eventual_Inc_Daft_Cloud_Storage_Credentials