Implementation:Eventual Inc Daft Read Csv
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Analytics |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for reading CSV files into a DataFrame provided by the Daft library.
Description
The read_csv function creates a lazy DataFrame from one or more CSV files. It supports local paths, S3, GCS, and Azure Blob Storage with glob pattern matching. The function provides extensive configuration for CSV parsing including custom delimiters, quote handling, escape characters, comment lines, and variable column support. Schema can be automatically inferred or explicitly provided.
Usage
Import and use this function when you need to read CSV files from local or remote storage into a Daft DataFrame.
Code Reference
Source Location
- Repository: Daft
- File:
daft/io/_csv.py - Lines: L18-100
Signature
def read_csv(
path: str | list[str],
infer_schema: bool = True,
schema: dict[str, DataType] | None = None,
has_headers: bool = True,
delimiter: str | None = None,
double_quote: bool = True,
quote: str | None = None,
escape_char: str | None = None,
comment: str | None = None,
allow_variable_columns: bool = False,
io_config: IOConfig | None = None,
file_path_column: str | None = None,
hive_partitioning: bool = False,
) -> DataFrame
Import
from daft import read_csv
# or
import daft
daft.read_csv(...)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | list[str] | Yes | Path to CSV file(s). Supports wildcards and remote URLs (e.g., s3://, gs://).
|
| infer_schema | bool | No | Whether to infer the schema of the CSV. Defaults to True.
|
| schema | None | No | Schema used as definitive (if infer_schema=False) or as a hint applied after inference.
|
| has_headers | bool | No | Whether the CSV has a header row. Defaults to True.
|
| delimiter | None | No | Delimiter character used in the CSV. Defaults to ",".
|
| double_quote | bool | No | Whether to support double quote escaping. Defaults to True.
|
| quote | None | No | Character used for quoting fields. |
| escape_char | None | No | Character used to escape the quote character within a field. |
| comment | None | No | Character that marks the start of a comment line. Defaults to None (comments not supported).
|
| allow_variable_columns | bool | No | Whether to allow variable number of columns per row. Defaults to False. If True, pads short rows with nulls and ignores extra columns.
|
| io_config | None | No | Configuration for the native downloader (S3, GCS, Azure credentials, etc.). |
| file_path_column | None | No | If set, includes the source file path as a column with this name. |
| hive_partitioning | bool | No | Whether to infer hive-style partitions from file paths. Defaults to False.
|
Outputs
| Name | Type | Description |
|---|---|---|
| return | DataFrame | A lazy DataFrame with a scan plan over the CSV data. No data is read until an action is triggered. |
Usage Examples
Basic Usage
import daft
# Read a single CSV file
df = daft.read_csv("/path/to/file.csv")
# Read all CSV files in a directory
df = daft.read_csv("/path/to/directory")
# Read with glob pattern
df = daft.read_csv("/path/to/files-*.csv")
Reading from S3 with Custom Delimiter
import daft
from daft.io import S3Config, IOConfig
io_config = IOConfig(s3=S3Config(region="us-west-2", anonymous=True))
df = daft.read_csv(
"s3://bucket/path/*.csv",
delimiter="\t",
io_config=io_config,
)
df.show()