Implementation:Eventual Inc Daft Read Parquet

Knowledge Sources	Daft Daft Docs
Domains	Data_Engineering, Analytics
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for reading Parquet files into a DataFrame provided by the Daft library.

Description

The read_parquet function creates a lazy DataFrame from one or more Apache Parquet files. It supports local paths, S3, GCS, and Azure Blob Storage with glob pattern matching. The function constructs a scan plan that defers actual data reading until an action is triggered, enabling predicate and projection pushdown optimizations. It also supports hive-style partitioning, custom schemas, row group selection, and Int96 timestamp coercion.

Usage

Import and use this function when you need to read Parquet files from local or remote storage into a Daft DataFrame.

Code Reference

Source Location

Repository: Daft
File: daft/io/_parquet.py
Lines: L18-96

Signature

def read_parquet(
    path: str | list[str],
    row_groups: list[list[int]] | None = None,
    infer_schema: bool = True,
    schema: dict[str, DataType] | None = None,
    io_config: IOConfig | None = None,
    file_path_column: str | None = None,
    hive_partitioning: bool = False,
    coerce_int96_timestamp_unit: str | TimeUnit | None = None,
) -> DataFrame

Import

from daft import read_parquet

# or
import daft
daft.read_parquet(...)

I/O Contract

Inputs

Name	Type	Required	Description
path	list[str]	Yes	Path to Parquet file(s). Supports wildcards and remote URLs (e.g., `s3://`, `gs://`).
row_groups	None	No	List of row groups to read corresponding to each file.
infer_schema	bool	No	Whether to infer the schema from the Parquet metadata. Defaults to `True`.
schema	None	No	Schema used as definitive (if `infer_schema=False`) or as a hint applied after inference.
io_config	None	No	Configuration for the native downloader (S3, GCS, Azure credentials, etc.).
file_path_column	None	No	If set, includes the source file path as a column with this name.
hive_partitioning	bool	No	Whether to infer hive-style partitions from file paths. Defaults to `False`.
coerce_int96_timestamp_unit	TimeUnit \| None	No	TimeUnit to coerce Int96 timestamps to (e.g., `ns`, `us`, `ms`). Defaults to `None`.

Outputs

Name	Type	Description
return	DataFrame	A lazy DataFrame with a scan plan over the Parquet data. No data is read until an action is triggered.

Usage Examples

Basic Usage

import daft

# Read a single Parquet file
df = daft.read_parquet("/path/to/file.parquet")

# Read all Parquet files in a directory
df = daft.read_parquet("/path/to/directory")

# Read with glob pattern
df = daft.read_parquet("/path/to/files-*.parquet")

Reading from S3

import daft
from daft.io import S3Config, IOConfig

io_config = IOConfig(s3=S3Config(region="us-west-2", anonymous=True))
df = daft.read_parquet("s3://bucket/path/*.parquet", io_config=io_config)
df.show()

Related Pages

Implements Principle

Principle:Eventual_Inc_Daft_Data_Ingestion_Parquet

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment