Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Eventual Inc Daft Python PyArrow Core

From Leeroopedia


Knowledge Sources
Domains Runtime Environment, Data Processing, Python Extension
Last Updated 2026-02-08 15:30 GMT

Overview

The Python_PyArrow_Core environment defines the base runtime requirements for the Daft distributed dataframe library, including the Python interpreter version, core dependencies, and the Rust-backed native extension module compiled via PyO3 and maturin.

Description

Daft is a distributed dataframe library designed for multimodal data processing. At its core, it relies on PyArrow for columnar data representation and interoperability, fsspec for filesystem abstraction across local and remote storage, and tqdm for progress bar rendering. The compute-intensive operations are implemented in Rust and compiled into a native Python extension module using PyO3 (Rust-to-Python bindings) and maturin (the build backend). This hybrid architecture delivers high performance while maintaining a Pythonic API surface.

The core environment also exposes several environment variables that control execution behavior, including runner selection (native multithreaded vs. Ray distributed), progress bar display, telemetry opt-out, and scan task parallelism tuning.

Usage

This environment is required by all Daft installations. Every optional feature (Ray distributed execution, cloud storage, AI integrations, etc.) builds on top of this base environment. Use this environment whenever you are:

  • Installing Daft for local or distributed data processing
  • Building Daft from source (requires Rust toolchain and maturin)
  • Configuring Daft execution behavior via environment variables
  • Developing or testing Daft extensions

System Requirements

Category Requirement Notes
Python >= 3.10 Specified in pyproject.toml line 23: requires-python = ">=3.10"
Rust Toolchain Stable (for building from source) Required only when compiling the native extension module
Build Backend maturin >= 1.5.0, < 2.0.0 Specified in pyproject.toml line 3: requires = ["maturin>=1.5.0,<2.0.0"]
Operating System Linux, macOS, Windows Windows has some limitations (e.g., Ray version constraints differ)

Dependencies

System Packages

  • Rust compiler (stable channel) -- only required for building from source
  • C/C++ compiler -- required by some transitive native dependencies
  • maturin >= 1.5.0, < 2.0.0 -- Python build backend for Rust extensions

Python Packages

  • pyarrow >= 8.0.0, < 23.0.0 -- Apache Arrow columnar data format for Python; core data representation layer
  • fsspec < 2025.11.0 -- Filesystem specification for Python; provides unified filesystem abstraction
  • tqdm < 4.68.0 -- Progress bar library; used for displaying execution progress
  • packaging -- Version parsing and specifier utilities
  • typing-extensions >= 4.0.0 -- Required only for Python < 3.11; backports newer typing features

Credentials

  • No credentials are required for the core environment itself.
  • Telemetry can be opted out of using environment variables (see below).

Environment Variables

Variable Values Default Description
DAFT_RUNNER "native" or "ray" "native" Selects the execution runner. The native runner uses local multithreaded execution; the ray runner distributes execution across a Ray cluster.
DAFT_PROGRESS_BAR "0" or "1" "1" Set to "0" to disable the progress bar. Useful for benchmarking as progress tracking can add overhead.
SCARF_NO_ANALYTICS "true" or "1" unset Disables Scarf telemetry when set.
DO_NOT_TRACK "true" or "1" unset Standard opt-out signal; disables Daft telemetry when set.
DAFT_ANALYTICS_ENABLED "0" or "false" unset Disables Daft analytics/telemetry when set to "0" or "false".
DAFT_SCANTASK_MAX_PARALLEL Integer or "auto" 8 Maximum parallelism for scan tasks. Set to "auto" to use all available CPUs (internally maps to 0).
DAFT_SHUFFLE_ALGORITHM String "auto" Selects the shuffle algorithm used during execution.
DAFT_MAINTAIN_ORDER Boolean string true Controls whether execution maintains row ordering.

Quick Install

# Install from PyPI (pre-built wheel with native extension)
pip install daft

# Install from source (requires Rust toolchain)
git clone https://github.com/Eventual-Inc/Daft.git
cd Daft
make .venv
make build

Code Evidence

Core dependencies from pyproject.toml lines 7-13:

dependencies = [
  "pyarrow >= 8.0.0,<23.0.0",
  "fsspec<2025.11.0",
  "tqdm<4.68.0",
  "typing-extensions >= 4.0.0; python_version < '3.11'",
  "packaging"
]

Python version requirement from pyproject.toml line 23:

requires-python = ">=3.10"

Telemetry opt-out from daft/scarf_telemetry.py lines 10-15:

def opted_out() -> bool:
    return (
        os.getenv("SCARF_NO_ANALYTICS") in ("true", "1")
        or os.getenv("DO_NOT_TRACK") in ("true", "1")
        or os.getenv("DAFT_ANALYTICS_ENABLED") in ("0", "false")
    )

Scan task max parallel default from src/common/daft-config/src/lib.rs line 175:

scantask_max_parallel: 8,

Build system configuration from pyproject.toml lines 1-3:

[build-system]
build-backend = "maturin"
requires = ["maturin>=1.5.0,<2.0.0"]

Common Errors

Error Message Cause Solution
ImportError: No module named 'daft.daft' The native Rust extension module was not compiled or is missing from the installation. Rebuild with make build or reinstall via pip install daft to get the pre-built wheel.
ImportError: pyarrow >= 8.0.0 is required PyArrow is not installed or is below the minimum version. Run pip install "pyarrow>=8.0.0,<23.0.0".
RuntimeError: Python >= 3.10 is required The Python interpreter version is below the minimum requirement. Upgrade to Python 3.10 or later.
maturin: command not found Maturin build tool is not installed (source builds only). Run pip install "maturin>=1.5.0,<2.0.0" or use make .venv.

Compatibility Notes

  • Python 3.10+ is the minimum supported version; Python 3.9 and earlier are not supported.
  • typing-extensions is only required for Python versions below 3.11 and provides backported typing features.
  • PyArrow has a wide version range (8.0.0 to 22.x) to maximize compatibility with existing environments. However, certain optional features (such as Hudi support) may restrict the upper bound further (e.g., pyarrow < 22.1.0 for Hudi).
  • Windows is supported but some optional extras (e.g., Ray) have different version constraints on Windows compared to Linux/macOS.
  • The Rust native extension is compiled with PyO3's "python" feature, meaning it builds as a standalone extension module without linking against libpython.so.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment