Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Eventual Inc Daft Read Deltalake

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Data_Lakehouse
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for reading Delta Lake tables into a lazy distributed DataFrame provided by the Daft library.

Description

The read_deltalake function creates a lazy DataFrame scan of a Delta Lake table. It accepts table URIs (including remote object stores like s3:// and gs://), DataCatalogTable references (e.g., AWS Glue), and UnityCatalogTable instances. Time travel is supported via version numbers (int), RFC 3339 timestamps (string), or datetime objects. The function creates a DeltaLakeScanOperator that handles the transaction log parsing, deletion vector processing, and predicate pushdown. Multithreaded IO is automatically disabled when running on the Ray runner to reduce system resource contention.

Usage

Import and use this function when you need to read data from a Delta Lake table with versioned access and deletion vector support.

Code Reference

Source Location

  • Repository: Daft
  • File: daft/io/delta_lake/_deltalake.py
  • Lines: L24-98

Signature

def read_deltalake(
    table: Union[str, DataCatalogTable, "UnityCatalogTable"],
    version: Union[int, str, "datetime"] | None = None,
    io_config: IOConfig | None = None,
    ignore_deletion_vectors: bool = False,
    _multithreaded_io: bool | None = None,
) -> DataFrame

Import

from daft import read_deltalake

# or
import daft
df = daft.read_deltalake(uri)

I/O Contract

Inputs

Name Type Required Description
table DataCatalogTable | UnityCatalogTable Yes URI to Delta Lake table or a catalog table reference
version str | datetime | None No Version number, RFC 3339 timestamp string, or datetime for time travel; defaults to latest version
io_config None No Custom IO configuration for accessing object storage; defaults to Daft context config
ignore_deletion_vectors bool No Whether to skip checking deletion vectors when reading; defaults to False

Outputs

Name Type Description
return DataFrame A lazy DataFrame with the schema converted from the Delta Lake table

Usage Examples

Basic Usage

import daft

# Read a Delta Lake table from a local path
df = daft.read_deltalake("some-table-uri")

# Apply filters (pushed down through Delta Lake metadata)
df = df.where(df["foo"] > 5)
df.show()

# Read from S3 with custom IO config
from daft.io import S3Config, IOConfig
io_config = IOConfig(s3=S3Config(region="us-west-2", anonymous=True))
df = daft.read_deltalake("s3://daft-public-data/test_fixtures/delta_table/", io_config=io_config)

# Read a specific version for time travel
df = daft.read_deltalake("s3://bucket/table", version=5)

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment