Heuristic:Eventual Inc Daft Runner Selection Guide
| Knowledge Sources | |
|---|---|
| Domains | Runner Configuration, Distributed Computing, Architecture |
| Last Updated | 2026-02-08 15:30 GMT |
Overview
Daft provides two execution backends -- the native runner (default, single-node, multi-threaded) and the Ray runner (distributed, multi-node) -- and choosing between them is one of the most consequential architectural decisions for a Daft pipeline.
Description
Native Runner
The native runner is the default execution backend. It runs entirely within a single process using multi-threaded parallelism and is the best choice for workloads that fit on a single machine. It has the lowest overhead, the fastest startup time, and requires no external dependencies beyond Daft itself.
Key characteristics:
- Default runner; no extra installation or configuration required.
- Multi-threaded execution on a single node.
- num_partitions() returns None because partitioning is managed internally by the native executor and is not exposed to the user (daft/dataframe/dataframe.py:354-356).
- into_partitions() is not supported on the native runner.
- GPU resources are shared across the entire process (all visible GPUs are accessible).
Ray Runner
The Ray runner enables distributed execution across multiple nodes in a Ray cluster. It is installed via pip install daft[ray] and activated by calling daft.set_runner_ray() or setting the DAFT_RUNNER=ray environment variable.
Key characteristics:
- Requires Ray to be installed (pip install daft[ray]).
- Supports additional configuration: address (Ray cluster address), max_task_backlog (limits queued tasks), force_client_mode (forces client mode even when running inside a Ray worker).
- num_partitions() returns the actual partition count.
- into_partitions() is supported for explicit repartitioning.
- Each Ray actor gets dedicated GPU resources, providing isolation between tasks.
- On Windows, requires Ray >= 2.10.0 due to pyarrow pin issues in earlier versions.
Auto-Detection
Daft automatically detects whether it is running inside a Ray cluster by checking ray.is_initialized() or the RAY_JOB_ID environment variable (daft/utils.py:138-157). If either condition is true, Daft selects the Ray runner without explicit configuration.
Runner Selection Methods
- daft.set_runner_ray() -- Programmatically select the Ray runner.
- daft.set_runner_native() -- Programmatically select the native runner.
- DAFT_RUNNER=ray or DAFT_RUNNER=native -- Environment variable selection.
- The test suite requires the DAFT_RUNNER environment variable to be set (tests/conftest.py:37-38); tests will fail without it.
Usage
Apply this heuristic when:
- Starting a new Daft project and deciding on the execution backend.
- A single-machine pipeline is hitting resource limits and you are evaluating horizontal scaling.
- Deploying Daft pipelines to production infrastructure (Kubernetes, cloud VMs, managed Ray clusters).
- Running the Daft test suite locally and needing to set the correct runner.
- Working with GPU workloads and needing to understand resource allocation differences.
The Insight (Rule of Thumb)
- Action: Use the native runner (the default) unless your workload requires more resources than a single machine can provide, at which point switch to the Ray runner for horizontal scaling.
- Value: The native runner has lower overhead, faster startup, and simpler operations. Ray adds the ability to scale out across multiple nodes, but at the cost of increased complexity, network serialization overhead, and external dependency management.
- Trade-off: Native runner is simpler and faster for single-node workloads, but it cannot scale beyond one machine. Ray runner enables multi-node scaling but introduces serialization overhead, cluster management complexity, and requires additional dependencies.
Reasoning
The native runner avoids all the overhead associated with distributed execution: there is no task serialization, no network transfer of intermediate data, and no external coordinator to manage. For workloads that fit in memory on a single machine, the native runner will almost always be faster.
The Ray runner becomes necessary when:
- The dataset is too large to process on a single machine.
- You need fault tolerance across long-running distributed jobs.
- You want to leverage a pre-existing Ray cluster shared across multiple applications.
- You need explicit control over partitioning via into_partitions().
The auto-detection behavior (checking ray.is_initialized() and RAY_JOB_ID) means that Daft pipelines deployed as Ray jobs will automatically use the Ray runner without code changes, which simplifies the transition from local development (native) to production (Ray).
A practical consideration: the num_partitions() method returns None on the native runner. Code that depends on knowing the exact partition count must use the Ray runner. Similarly, into_partitions() is Ray-only, so explicit repartitioning logic must account for the runner in use.
For GPU workloads, the runner choice affects resource isolation. On Ray, each actor gets dedicated GPU resources, preventing contention. On the native runner, all tasks share the same GPU(s), which can lead to memory pressure if multiple operations attempt to use the GPU concurrently.