Implementation:Eventual Inc Daft Read Deltalake
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Lakehouse |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for reading Delta Lake tables into a lazy distributed DataFrame provided by the Daft library.
Description
The read_deltalake function creates a lazy DataFrame scan of a Delta Lake table. It accepts table URIs (including remote object stores like s3:// and gs://), DataCatalogTable references (e.g., AWS Glue), and UnityCatalogTable instances. Time travel is supported via version numbers (int), RFC 3339 timestamps (string), or datetime objects. The function creates a DeltaLakeScanOperator that handles the transaction log parsing, deletion vector processing, and predicate pushdown. Multithreaded IO is automatically disabled when running on the Ray runner to reduce system resource contention.
Usage
Import and use this function when you need to read data from a Delta Lake table with versioned access and deletion vector support.
Code Reference
Source Location
- Repository: Daft
- File:
daft/io/delta_lake/_deltalake.py - Lines: L24-98
Signature
def read_deltalake(
table: Union[str, DataCatalogTable, "UnityCatalogTable"],
version: Union[int, str, "datetime"] | None = None,
io_config: IOConfig | None = None,
ignore_deletion_vectors: bool = False,
_multithreaded_io: bool | None = None,
) -> DataFrame
Import
from daft import read_deltalake
# or
import daft
df = daft.read_deltalake(uri)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| table | DataCatalogTable | UnityCatalogTable | Yes | URI to Delta Lake table or a catalog table reference |
| version | str | datetime | None | No | Version number, RFC 3339 timestamp string, or datetime for time travel; defaults to latest version |
| io_config | None | No | Custom IO configuration for accessing object storage; defaults to Daft context config |
| ignore_deletion_vectors | bool | No | Whether to skip checking deletion vectors when reading; defaults to False |
Outputs
| Name | Type | Description |
|---|---|---|
| return | DataFrame | A lazy DataFrame with the schema converted from the Delta Lake table |
Usage Examples
Basic Usage
import daft
# Read a Delta Lake table from a local path
df = daft.read_deltalake("some-table-uri")
# Apply filters (pushed down through Delta Lake metadata)
df = df.where(df["foo"] > 5)
df.show()
# Read from S3 with custom IO config
from daft.io import S3Config, IOConfig
io_config = IOConfig(s3=S3Config(region="us-west-2", anonymous=True))
df = daft.read_deltalake("s3://daft-public-data/test_fixtures/delta_table/", io_config=io_config)
# Read a specific version for time travel
df = daft.read_deltalake("s3://bucket/table", version=5)
Related Pages
Implements Principle
Requires Environment
- Environment:Eventual_Inc_Daft_Python_PyArrow_Core
- Environment:Eventual_Inc_Daft_Cloud_Storage_Credentials