Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Eventual Inc Daft Read Iceberg

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Data_Lakehouse
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for reading Apache Iceberg tables into a lazy distributed DataFrame provided by the Daft library.

Description

The read_iceberg function creates a lazy DataFrame scan of an Apache Iceberg table. It accepts either a string path to an Iceberg metadata file or a PyIceberg Table object. When given a string, it uses StaticTable.from_metadata() to load the table. IO configuration is resolved from the table's file IO properties if not explicitly provided. The function creates an IcebergScanOperator that handles partition pruning and predicate pushdown through the Iceberg metadata layer. Multithreaded IO is automatically disabled when running on the Ray runner to limit resource contention.

Usage

Import and use this function when you need to read data from an Apache Iceberg table with snapshot isolation and optional time travel via snapshot IDs.

Code Reference

Source Location

  • Repository: Daft
  • File: daft/io/iceberg/_iceberg.py
  • Lines: L56-114

Signature

def read_iceberg(
    table: Union[str, "PyIcebergTable"],
    snapshot_id: int | None = None,
    io_config: IOConfig | None = None,
) -> DataFrame

Import

from daft import read_iceberg

# or
import daft
df = daft.read_iceberg(table)

I/O Contract

Inputs

Name Type Required Description
table PyIcebergTable Yes Path to an Iceberg metadata file (supports s3://, gs://) or a PyIceberg Table instance
snapshot_id None No Specific snapshot ID to query for time travel; defaults to latest snapshot
io_config None No Custom IO configuration for accessing object storage; defaults to table's file IO properties

Outputs

Name Type Description
return DataFrame A lazy DataFrame with the schema converted from the Iceberg table, supporting predicate pushdown and partition pruning

Usage Examples

Basic Usage

import daft

# Read from a PyIceberg table object
df = daft.read_iceberg(pyiceberg_table)

# Apply filters (pushed down to Iceberg metadata layer)
df = df.where(df["category"] == "electronics")
df.show()

# Read with time travel to a specific snapshot
df = daft.read_iceberg(pyiceberg_table, snapshot_id=123456789)

# Read from a metadata file path with custom IO config
from daft.io import S3Config, IOConfig
io_config = IOConfig(s3=S3Config(region="us-west-2", anonymous=True))
df = daft.read_iceberg("s3://bucket/path/to/metadata.json", io_config=io_config)

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment