Environment:Eventual Inc Daft Python PyArrow Core
| Knowledge Sources | |
|---|---|
| Domains | Runtime Environment, Data Processing, Python Extension |
| Last Updated | 2026-02-08 15:30 GMT |
Overview
The Python_PyArrow_Core environment defines the base runtime requirements for the Daft distributed dataframe library, including the Python interpreter version, core dependencies, and the Rust-backed native extension module compiled via PyO3 and maturin.
Description
Daft is a distributed dataframe library designed for multimodal data processing. At its core, it relies on PyArrow for columnar data representation and interoperability, fsspec for filesystem abstraction across local and remote storage, and tqdm for progress bar rendering. The compute-intensive operations are implemented in Rust and compiled into a native Python extension module using PyO3 (Rust-to-Python bindings) and maturin (the build backend). This hybrid architecture delivers high performance while maintaining a Pythonic API surface.
The core environment also exposes several environment variables that control execution behavior, including runner selection (native multithreaded vs. Ray distributed), progress bar display, telemetry opt-out, and scan task parallelism tuning.
Usage
This environment is required by all Daft installations. Every optional feature (Ray distributed execution, cloud storage, AI integrations, etc.) builds on top of this base environment. Use this environment whenever you are:
- Installing Daft for local or distributed data processing
- Building Daft from source (requires Rust toolchain and maturin)
- Configuring Daft execution behavior via environment variables
- Developing or testing Daft extensions
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Python | >= 3.10 | Specified in pyproject.toml line 23: requires-python = ">=3.10"
|
| Rust Toolchain | Stable (for building from source) | Required only when compiling the native extension module |
| Build Backend | maturin >= 1.5.0, < 2.0.0 | Specified in pyproject.toml line 3: requires = ["maturin>=1.5.0,<2.0.0"]
|
| Operating System | Linux, macOS, Windows | Windows has some limitations (e.g., Ray version constraints differ) |
Dependencies
System Packages
- Rust compiler (stable channel) -- only required for building from source
- C/C++ compiler -- required by some transitive native dependencies
- maturin >= 1.5.0, < 2.0.0 -- Python build backend for Rust extensions
Python Packages
- pyarrow >= 8.0.0, < 23.0.0 -- Apache Arrow columnar data format for Python; core data representation layer
- fsspec < 2025.11.0 -- Filesystem specification for Python; provides unified filesystem abstraction
- tqdm < 4.68.0 -- Progress bar library; used for displaying execution progress
- packaging -- Version parsing and specifier utilities
- typing-extensions >= 4.0.0 -- Required only for Python < 3.11; backports newer typing features
Credentials
- No credentials are required for the core environment itself.
- Telemetry can be opted out of using environment variables (see below).
Environment Variables
| Variable | Values | Default | Description |
|---|---|---|---|
DAFT_RUNNER |
"native" or "ray" |
"native" |
Selects the execution runner. The native runner uses local multithreaded execution; the ray runner distributes execution across a Ray cluster. |
DAFT_PROGRESS_BAR |
"0" or "1" |
"1" |
Set to "0" to disable the progress bar. Useful for benchmarking as progress tracking can add overhead.
|
SCARF_NO_ANALYTICS |
"true" or "1" |
unset | Disables Scarf telemetry when set. |
DO_NOT_TRACK |
"true" or "1" |
unset | Standard opt-out signal; disables Daft telemetry when set. |
DAFT_ANALYTICS_ENABLED |
"0" or "false" |
unset | Disables Daft analytics/telemetry when set to "0" or "false".
|
DAFT_SCANTASK_MAX_PARALLEL |
Integer or "auto" |
8 |
Maximum parallelism for scan tasks. Set to "auto" to use all available CPUs (internally maps to 0).
|
DAFT_SHUFFLE_ALGORITHM |
String | "auto" |
Selects the shuffle algorithm used during execution. |
DAFT_MAINTAIN_ORDER |
Boolean string | true |
Controls whether execution maintains row ordering. |
Quick Install
# Install from PyPI (pre-built wheel with native extension)
pip install daft
# Install from source (requires Rust toolchain)
git clone https://github.com/Eventual-Inc/Daft.git
cd Daft
make .venv
make build
Code Evidence
Core dependencies from pyproject.toml lines 7-13:
dependencies = [
"pyarrow >= 8.0.0,<23.0.0",
"fsspec<2025.11.0",
"tqdm<4.68.0",
"typing-extensions >= 4.0.0; python_version < '3.11'",
"packaging"
]
Python version requirement from pyproject.toml line 23:
requires-python = ">=3.10"
Telemetry opt-out from daft/scarf_telemetry.py lines 10-15:
def opted_out() -> bool:
return (
os.getenv("SCARF_NO_ANALYTICS") in ("true", "1")
or os.getenv("DO_NOT_TRACK") in ("true", "1")
or os.getenv("DAFT_ANALYTICS_ENABLED") in ("0", "false")
)
Scan task max parallel default from src/common/daft-config/src/lib.rs line 175:
scantask_max_parallel: 8,
Build system configuration from pyproject.toml lines 1-3:
[build-system]
build-backend = "maturin"
requires = ["maturin>=1.5.0,<2.0.0"]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
ImportError: No module named 'daft.daft' |
The native Rust extension module was not compiled or is missing from the installation. | Rebuild with make build or reinstall via pip install daft to get the pre-built wheel.
|
ImportError: pyarrow >= 8.0.0 is required |
PyArrow is not installed or is below the minimum version. | Run pip install "pyarrow>=8.0.0,<23.0.0".
|
RuntimeError: Python >= 3.10 is required |
The Python interpreter version is below the minimum requirement. | Upgrade to Python 3.10 or later. |
maturin: command not found |
Maturin build tool is not installed (source builds only). | Run pip install "maturin>=1.5.0,<2.0.0" or use make .venv.
|
Compatibility Notes
- Python 3.10+ is the minimum supported version; Python 3.9 and earlier are not supported.
- typing-extensions is only required for Python versions below 3.11 and provides backported typing features.
- PyArrow has a wide version range (8.0.0 to 22.x) to maximize compatibility with existing environments. However, certain optional features (such as Hudi support) may restrict the upper bound further (e.g.,
pyarrow < 22.1.0for Hudi). - Windows is supported but some optional extras (e.g., Ray) have different version constraints on Windows compared to Linux/macOS.
- The Rust native extension is compiled with PyO3's
"python"feature, meaning it builds as a standalone extension module without linking againstlibpython.so.
Related Pages
- Implementation:Eventual_Inc_Daft_Set_Runner_Ray
- Implementation:Eventual_Inc_Daft_Read_Parquet
- Implementation:Eventual_Inc_Daft_Read_Huggingface
- Implementation:Eventual_Inc_Daft_DataFrame_Write_Deltalake
- Implementation:Eventual_Inc_Daft_AI_Prompt
- Implementation:Eventual_Inc_Daft_AI_Embed_Text
- Implementation:Eventual_Inc_Daft_AI_Embed_Image
- Environment:Eventual_Inc_Daft_Ray_Distributed_Runner
- Environment:Eventual_Inc_Daft_Cloud_Storage_Credentials
- Environment:Eventual_Inc_Daft_AI_Provider_Dependencies