Principle:Eventual Inc Daft Iceberg Writing
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Lakehouse |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Iceberg writing is the technique for persisting DataFrame contents to an Apache Iceberg table with transactional guarantees and snapshot isolation.
Description
Iceberg writing appends or overwrites data in an Iceberg table while maintaining snapshot isolation, schema enforcement, and partition evolution. Each write operation creates a new snapshot in the Iceberg table's metadata, enabling time travel to any previous state. The write process first executes the DataFrame to produce data files in the table's storage location, then atomically commits those files to the table's metadata through a transaction. For overwrite mode, existing data is logically deleted before the new data is appended, all within a single atomic transaction.
Usage
Use Iceberg writing when you need to persist processed data to an Iceberg table with ACID guarantees. This is appropriate for ETL pipelines, data lake ingestion, and any workflow that requires transactional writes with schema enforcement and partition management.
Theoretical Basis
Iceberg writing follows a snapshot-based write protocol for atomic commits:
1. Execute the DataFrame to produce data files (Parquet) in the table's storage location
2. Collect metadata about written files (paths, row counts, sizes, partition values)
3. Begin a transaction on the Iceberg table
4. For overwrite mode: mark all existing data for deletion
5. Create a new snapshot with:
- Manifest list pointing to new manifest files
- Manifest files referencing the new data files
- Partition statistics and column-level metrics
6. Commit the transaction atomically
7. Return a metadata DataFrame with operation details (ADD/DELETE, rows, file sizes)
The atomic commit ensures that readers either see the complete new snapshot or the previous one, never a partial state. Manifest merging can be enabled for append operations to consolidate manifest files and improve read performance.