Implementation:Eventual Inc Daft Download
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Data_Preprocessing |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for downloading URL content as binary data provided by the Daft library.
Description
The download function treats each string in an expression as a URL and downloads the bytes content, returning a Binary expression. It uses a configurable number of concurrent connections per thread (default 32) and supports both "raise" and "null" error handling modes. The function automatically adapts its I/O runtime based on the execution environment: multi-threaded for local execution and single-threaded for Ray to avoid excessive connection counts.
Usage
Import and use this function when you need to download binary content from URLs stored in a DataFrame column.
Code Reference
Source Location
- Repository: Daft
- File:
daft/functions/url.py - Lines: L44-94
Signature
def download(
expr: Expression,
max_connections: int = 32,
on_error: Literal["raise", "null"] = "raise",
io_config: IOConfig | None = None,
) -> Expression
Import
from daft.functions import download
# or use as an Expression method
import daft
daft.col("url").download()
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| expr | Expression | Yes | A String expression containing URLs to download. |
| max_connections | int | No | Maximum number of concurrent connections per thread. Defaults to 32.
|
| on_error | Literal["raise", "null"] | No | Error handling behavior. "raise" fails immediately on error; "null" returns null and logs a warning. Defaults to "raise".
|
| io_config | None | No | IO configuration for accessing remote storage. The S3Config's max_connections is overridden with the max_connections kwarg.
|
Outputs
| Name | Type | Description |
|---|---|---|
| return | Expression (Binary) | A Binary expression containing the downloaded bytes content for each URL, or None if an error occurred and on_error="null".
|
Usage Examples
Basic Usage
import daft
df = daft.from_pydict({"urls": ["https://example.com/image1.png", "https://example.com/image2.png"]})
df = df.with_column("content", daft.col("urls").download())
df.show()
With Error Handling
import daft
df = daft.from_pydict({"urls": ["https://example.com/valid.png", "https://example.com/missing.png"]})
df = df.with_column("content", daft.col("urls").download(on_error="null"))
df.show()