Implementation:Eventual Inc Daft Read Csv

Knowledge Sources	Daft Daft Docs
Domains	Data_Engineering, Analytics
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for reading CSV files into a DataFrame provided by the Daft library.

Description

The read_csv function creates a lazy DataFrame from one or more CSV files. It supports local paths, S3, GCS, and Azure Blob Storage with glob pattern matching. The function provides extensive configuration for CSV parsing including custom delimiters, quote handling, escape characters, comment lines, and variable column support. Schema can be automatically inferred or explicitly provided.

Usage

Import and use this function when you need to read CSV files from local or remote storage into a Daft DataFrame.

Code Reference

Source Location

Repository: Daft
File: daft/io/_csv.py
Lines: L18-100

Signature

def read_csv(
    path: str | list[str],
    infer_schema: bool = True,
    schema: dict[str, DataType] | None = None,
    has_headers: bool = True,
    delimiter: str | None = None,
    double_quote: bool = True,
    quote: str | None = None,
    escape_char: str | None = None,
    comment: str | None = None,
    allow_variable_columns: bool = False,
    io_config: IOConfig | None = None,
    file_path_column: str | None = None,
    hive_partitioning: bool = False,
) -> DataFrame

Import

from daft import read_csv

# or
import daft
daft.read_csv(...)

I/O Contract

Inputs

Name	Type	Required	Description
path	list[str]	Yes	Path to CSV file(s). Supports wildcards and remote URLs (e.g., `s3://`, `gs://`).
infer_schema	bool	No	Whether to infer the schema of the CSV. Defaults to `True`.
schema	None	No	Schema used as definitive (if `infer_schema=False`) or as a hint applied after inference.
has_headers	bool	No	Whether the CSV has a header row. Defaults to `True`.
delimiter	None	No	Delimiter character used in the CSV. Defaults to `","`.
double_quote	bool	No	Whether to support double quote escaping. Defaults to `True`.
quote	None	No	Character used for quoting fields.
escape_char	None	No	Character used to escape the quote character within a field.
comment	None	No	Character that marks the start of a comment line. Defaults to `None` (comments not supported).
allow_variable_columns	bool	No	Whether to allow variable number of columns per row. Defaults to `False`. If `True`, pads short rows with nulls and ignores extra columns.
io_config	None	No	Configuration for the native downloader (S3, GCS, Azure credentials, etc.).
file_path_column	None	No	If set, includes the source file path as a column with this name.
hive_partitioning	bool	No	Whether to infer hive-style partitions from file paths. Defaults to `False`.

Outputs

Name	Type	Description
return	DataFrame	A lazy DataFrame with a scan plan over the CSV data. No data is read until an action is triggered.

Usage Examples

Basic Usage

import daft

# Read a single CSV file
df = daft.read_csv("/path/to/file.csv")

# Read all CSV files in a directory
df = daft.read_csv("/path/to/directory")

# Read with glob pattern
df = daft.read_csv("/path/to/files-*.csv")

Reading from S3 with Custom Delimiter

import daft
from daft.io import S3Config, IOConfig

io_config = IOConfig(s3=S3Config(region="us-west-2", anonymous=True))
df = daft.read_csv(
    "s3://bucket/path/*.csv",
    delimiter="\t",
    io_config=io_config,
)
df.show()

Related Pages

Implements Principle

Principle:Eventual_Inc_Daft_Data_Ingestion_CSV

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment