Implementation:Eventual Inc Daft DataFrame Write Iceberg
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Lakehouse |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for writing DataFrame contents to an Apache Iceberg table with transactional guarantees provided by the Daft library.
Description
The write_iceberg method on a Daft DataFrame writes data to an Iceberg table in either append or overwrite mode. The operation is blocking: it executes the DataFrame, produces data files, and atomically commits them to the Iceberg table metadata through a transaction. For overwrite mode, existing files are marked for deletion before new files are appended, all within a single transaction. The method supports partitioned tables (requires pyiceberg >= 0.7.0) and manifest merging for append operations. It returns a metadata DataFrame containing operation details (ADD/DELETE actions, row counts, file sizes, and partition values). Requires pyiceberg >= 0.6.0 and pyarrow >= 12.0.1.
Usage
Use this method on a DataFrame when you need to persist processed data to an Iceberg table with ACID guarantees. This call is blocking and will execute the DataFrame immediately.
Code Reference
Source Location
- Repository: Daft
- File:
daft/dataframe/dataframe.py - Lines: L1035-1195
Signature
def write_iceberg(
self,
table: "pyiceberg.table.Table",
mode: str = "append",
io_config: IOConfig | None = None,
) -> "DataFrame"
Import
# Method on DataFrame, no separate import needed
df.write_iceberg(iceberg_table, mode="append")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| table | pyiceberg.table.Table | Yes | Destination PyIceberg Table to write data to |
| mode | str | No | Operation mode: "append" to add rows or "overwrite" to replace existing data; defaults to "append" |
| io_config | None | No | Custom IO configuration; defaults to table's file IO properties |
Outputs
| Name | Type | Description |
|---|---|---|
| return | DataFrame | A metadata DataFrame with columns: operation (ADD/DELETE), rows (int), file_size (int), file_name (str), and optionally partitioning (struct) |
Usage Examples
Basic Usage
import daft
# Write data to an Iceberg table (append mode)
df = daft.from_pydict({"user_id": [1, 2, 3], "name": ["Alice", "Bob", "Charlie"]})
result = df.write_iceberg(iceberg_table, mode="append")
result.show() # Shows ADD operations with row counts and file sizes
# Overwrite existing data in an Iceberg table
result = df.write_iceberg(iceberg_table, mode="overwrite")
result.show() # Shows DELETE operations for old files and ADD for new files
Related Pages
Implements Principle
Requires Environment
- Environment:Eventual_Inc_Daft_Python_PyArrow_Core
- Environment:Eventual_Inc_Daft_Cloud_Storage_Credentials