Principle:Eventual Inc Daft Iceberg Writing

Knowledge Sources	Daft Daft Docs
Domains	Data_Engineering, Data_Lakehouse
Last Updated	2026-02-08 00:00 GMT

Overview

Iceberg writing is the technique for persisting DataFrame contents to an Apache Iceberg table with transactional guarantees and snapshot isolation.

Description

Iceberg writing appends or overwrites data in an Iceberg table while maintaining snapshot isolation, schema enforcement, and partition evolution. Each write operation creates a new snapshot in the Iceberg table's metadata, enabling time travel to any previous state. The write process first executes the DataFrame to produce data files in the table's storage location, then atomically commits those files to the table's metadata through a transaction. For overwrite mode, existing data is logically deleted before the new data is appended, all within a single atomic transaction.

Usage

Use Iceberg writing when you need to persist processed data to an Iceberg table with ACID guarantees. This is appropriate for ETL pipelines, data lake ingestion, and any workflow that requires transactional writes with schema enforcement and partition management.

Theoretical Basis

Iceberg writing follows a snapshot-based write protocol for atomic commits:

1. Execute the DataFrame to produce data files (Parquet) in the table's storage location
2. Collect metadata about written files (paths, row counts, sizes, partition values)
3. Begin a transaction on the Iceberg table
4. For overwrite mode: mark all existing data for deletion
5. Create a new snapshot with:
   - Manifest list pointing to new manifest files
   - Manifest files referencing the new data files
   - Partition statistics and column-level metrics
6. Commit the transaction atomically
7. Return a metadata DataFrame with operation details (ADD/DELETE, rows, file sizes)

The atomic commit ensures that readers either see the complete new snapshot or the previous one, never a partial state. Manifest merging can be enabled for append operations to consolidate manifest files and improve read performance.

Related Pages

Implemented By

Implementation:Eventual_Inc_Daft_DataFrame_Write_Iceberg

Uses Heuristic

Heuristic:Eventual_Inc_Daft_Execution_Config_Tuning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment