Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Eventual Inc Daft Read Csv

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Analytics
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for reading CSV files into a DataFrame provided by the Daft library.

Description

The read_csv function creates a lazy DataFrame from one or more CSV files. It supports local paths, S3, GCS, and Azure Blob Storage with glob pattern matching. The function provides extensive configuration for CSV parsing including custom delimiters, quote handling, escape characters, comment lines, and variable column support. Schema can be automatically inferred or explicitly provided.

Usage

Import and use this function when you need to read CSV files from local or remote storage into a Daft DataFrame.

Code Reference

Source Location

  • Repository: Daft
  • File: daft/io/_csv.py
  • Lines: L18-100

Signature

def read_csv(
    path: str | list[str],
    infer_schema: bool = True,
    schema: dict[str, DataType] | None = None,
    has_headers: bool = True,
    delimiter: str | None = None,
    double_quote: bool = True,
    quote: str | None = None,
    escape_char: str | None = None,
    comment: str | None = None,
    allow_variable_columns: bool = False,
    io_config: IOConfig | None = None,
    file_path_column: str | None = None,
    hive_partitioning: bool = False,
) -> DataFrame

Import

from daft import read_csv

# or
import daft
daft.read_csv(...)

I/O Contract

Inputs

Name Type Required Description
path list[str] Yes Path to CSV file(s). Supports wildcards and remote URLs (e.g., s3://, gs://).
infer_schema bool No Whether to infer the schema of the CSV. Defaults to True.
schema None No Schema used as definitive (if infer_schema=False) or as a hint applied after inference.
has_headers bool No Whether the CSV has a header row. Defaults to True.
delimiter None No Delimiter character used in the CSV. Defaults to ",".
double_quote bool No Whether to support double quote escaping. Defaults to True.
quote None No Character used for quoting fields.
escape_char None No Character used to escape the quote character within a field.
comment None No Character that marks the start of a comment line. Defaults to None (comments not supported).
allow_variable_columns bool No Whether to allow variable number of columns per row. Defaults to False. If True, pads short rows with nulls and ignores extra columns.
io_config None No Configuration for the native downloader (S3, GCS, Azure credentials, etc.).
file_path_column None No If set, includes the source file path as a column with this name.
hive_partitioning bool No Whether to infer hive-style partitions from file paths. Defaults to False.

Outputs

Name Type Description
return DataFrame A lazy DataFrame with a scan plan over the CSV data. No data is read until an action is triggered.

Usage Examples

Basic Usage

import daft

# Read a single CSV file
df = daft.read_csv("/path/to/file.csv")

# Read all CSV files in a directory
df = daft.read_csv("/path/to/directory")

# Read with glob pattern
df = daft.read_csv("/path/to/files-*.csv")

Reading from S3 with Custom Delimiter

import daft
from daft.io import S3Config, IOConfig

io_config = IOConfig(s3=S3Config(region="us-west-2", anonymous=True))
df = daft.read_csv(
    "s3://bucket/path/*.csv",
    delimiter="\t",
    io_config=io_config,
)
df.show()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment