Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Eventual Inc Daft Delta Lake Reading

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Data_Lakehouse
Last Updated 2026-02-08 00:00 GMT

Overview

Delta Lake reading is the technique for creating a lazy DataFrame scan of a Delta Lake table with versioned access and deletion vector support.

Description

Delta Lake reading creates a lazy DataFrame scan of a Delta Lake table. It supports time travel via version numbers or RFC 3339 timestamps, allowing queries against historical states of the table. The scan respects deletion vectors to ensure correct reads when rows have been logically deleted. Like other Daft read operations, the resulting DataFrame is lazy and benefits from predicate pushdown and partition pruning during query optimization. The function accepts table URIs (including remote object stores like S3 and GCS), DataCatalogTable references, and UnityCatalogTable instances.

Usage

Use Delta Lake reading when you need to read data from a Delta Lake table with version history support. This is appropriate for workloads against data lakes using the Delta Lake format, especially when you need time travel, ACID guarantees, or integration with Databricks Unity Catalog and AWS Glue Data Catalog.

Theoretical Basis

Delta Lake is a log-structured table format that provides ACID transactions on top of cloud object storage. Key concepts:

  • Transaction log: A sequence of JSON files recording every change to the table, enabling atomic commits and time travel.
  • Versioning: Each commit creates a new version, allowing queries against any historical state by version number or timestamp.
  • Deletion vectors: Lightweight markers that logically delete rows without rewriting data files, improving write performance for updates and deletes.
  • Schema enforcement: The transaction log enforces schema consistency, preventing writes with incompatible schemas.
1. Resolve the Delta Lake table (from URI, DataCatalogTable, or UnityCatalogTable)
2. Read the transaction log to determine the target version (latest or specified)
3. Compute the set of active data files from add/remove actions in the log
4. Apply deletion vectors to identify logically deleted rows
5. Create a lazy scan operator that reads only the required data files on execution

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment