Implementation:Eventual Inc Daft Read Iceberg

Knowledge Sources	Daft Daft Docs
Domains	Data_Engineering, Data_Lakehouse
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for reading Apache Iceberg tables into a lazy distributed DataFrame provided by the Daft library.

Description

The read_iceberg function creates a lazy DataFrame scan of an Apache Iceberg table. It accepts either a string path to an Iceberg metadata file or a PyIceberg Table object. When given a string, it uses StaticTable.from_metadata() to load the table. IO configuration is resolved from the table's file IO properties if not explicitly provided. The function creates an IcebergScanOperator that handles partition pruning and predicate pushdown through the Iceberg metadata layer. Multithreaded IO is automatically disabled when running on the Ray runner to limit resource contention.

Usage

Import and use this function when you need to read data from an Apache Iceberg table with snapshot isolation and optional time travel via snapshot IDs.

Code Reference

Source Location

Repository: Daft
File: daft/io/iceberg/_iceberg.py
Lines: L56-114

Signature

def read_iceberg(
    table: Union[str, "PyIcebergTable"],
    snapshot_id: int | None = None,
    io_config: IOConfig | None = None,
) -> DataFrame

Import

from daft import read_iceberg

# or
import daft
df = daft.read_iceberg(table)

I/O Contract

Inputs

Name	Type	Required	Description
table	PyIcebergTable	Yes	Path to an Iceberg metadata file (supports `s3://`, `gs://`) or a PyIceberg Table instance
snapshot_id	None	No	Specific snapshot ID to query for time travel; defaults to latest snapshot
io_config	None	No	Custom IO configuration for accessing object storage; defaults to table's file IO properties

Outputs

Name	Type	Description
return	DataFrame	A lazy DataFrame with the schema converted from the Iceberg table, supporting predicate pushdown and partition pruning

Usage Examples

Basic Usage

import daft

# Read from a PyIceberg table object
df = daft.read_iceberg(pyiceberg_table)

# Apply filters (pushed down to Iceberg metadata layer)
df = df.where(df["category"] == "electronics")
df.show()

# Read with time travel to a specific snapshot
df = daft.read_iceberg(pyiceberg_table, snapshot_id=123456789)

# Read from a metadata file path with custom IO config
from daft.io import S3Config, IOConfig
io_config = IOConfig(s3=S3Config(region="us-west-2", anonymous=True))
df = daft.read_iceberg("s3://bucket/path/to/metadata.json", io_config=io_config)

Related Pages

Implements Principle

Principle:Eventual_Inc_Daft_Iceberg_Reading

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment