Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Eventual Inc Daft Download

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Data_Preprocessing
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for downloading URL content as binary data provided by the Daft library.

Description

The download function treats each string in an expression as a URL and downloads the bytes content, returning a Binary expression. It uses a configurable number of concurrent connections per thread (default 32) and supports both "raise" and "null" error handling modes. The function automatically adapts its I/O runtime based on the execution environment: multi-threaded for local execution and single-threaded for Ray to avoid excessive connection counts.

Usage

Import and use this function when you need to download binary content from URLs stored in a DataFrame column.

Code Reference

Source Location

  • Repository: Daft
  • File: daft/functions/url.py
  • Lines: L44-94

Signature

def download(
    expr: Expression,
    max_connections: int = 32,
    on_error: Literal["raise", "null"] = "raise",
    io_config: IOConfig | None = None,
) -> Expression

Import

from daft.functions import download

# or use as an Expression method
import daft
daft.col("url").download()

I/O Contract

Inputs

Name Type Required Description
expr Expression Yes A String expression containing URLs to download.
max_connections int No Maximum number of concurrent connections per thread. Defaults to 32.
on_error Literal["raise", "null"] No Error handling behavior. "raise" fails immediately on error; "null" returns null and logs a warning. Defaults to "raise".
io_config None No IO configuration for accessing remote storage. The S3Config's max_connections is overridden with the max_connections kwarg.

Outputs

Name Type Description
return Expression (Binary) A Binary expression containing the downloaded bytes content for each URL, or None if an error occurred and on_error="null".

Usage Examples

Basic Usage

import daft

df = daft.from_pydict({"urls": ["https://example.com/image1.png", "https://example.com/image2.png"]})
df = df.with_column("content", daft.col("urls").download())
df.show()

With Error Handling

import daft

df = daft.from_pydict({"urls": ["https://example.com/valid.png", "https://example.com/missing.png"]})
df = df.with_column("content", daft.col("urls").download(on_error="null"))
df.show()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment