Environment:Pola rs Polars GPU Execution Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, GPU_Acceleration |
| Last Updated | 2026-02-09 10:00 GMT |
Overview
NVIDIA GPU environment with CUDA 12 and RAPIDS cuDF for GPU-accelerated query execution in the Polars Lazy API.
Description
This environment provides GPU-accelerated execution for Polars lazy queries using NVIDIA RAPIDS cuDF as the physical execution backend. When invoked via `collect(engine="gpu")`, the optimized query plan is dispatched to the GPU. The GPU engine supports many core expressions and data types but not the full Polars expression API. Unsupported queries transparently fall back to the CPU engine unless `raise_on_fail=True` is set. Results are always returned as standard CPU-backed Polars DataFrames.
Usage
Use this environment when your workflow is dominated by grouped aggregations and joins on datasets that fit in GPU memory. I/O-bound queries typically show similar performance on GPU and CPU. Raw datasets of 50-100 GiB fit well with a GPU with 80 GiB of memory, depending on the workflow.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| GPU | NVIDIA Volta or higher | Compute capability >= 7.0 |
| GPU VRAM | 8GB minimum | 80GB recommended for large datasets (A100/H100) |
| CUDA | CUDA 12 | CUDA 11 support ending with RAPIDS v25.06 |
| OS | Linux or WSL2 | Native Windows not supported |
| Python | >= 3.10 | Same as core Polars requirement |
Dependencies
System Packages
- NVIDIA GPU driver (compatible with CUDA 12)
- CUDA Toolkit 12.x
Python Packages
- `polars` >= 1.38.1
- `cudf-polars-cu12` (installed via `pip install polars[gpu]`)
CUDA 11 (Deprecated)
- `cudf-polars-cu11` == 25.06 (pinned version, CUDA 11 support dropping in RAPIDS 25.08)
Credentials
No additional credentials required beyond the base Environment:Pola_rs_Polars_Python_Runtime_Environment.
Quick Install
# Standard GPU install (CUDA 12)
pip install polars[gpu]
# For CUDA 11 systems (deprecated, pinned to v25.06)
pip install polars cudf-polars-cu11==25.06
Code Evidence
GPU optional dependency from `py-polars/pyproject.toml:87`:
gpu = ["cudf-polars-cu12"]
GPU usage from `docs/source/user-guide/gpu-support.md:42-50`:
result = q.collect(engine="gpu")
# With detailed control
result = q.collect(engine=pl.GPUEngine(device=1))
# Disable fallback to raise on unsupported
q.collect(engine=pl.GPUEngine(raise_on_fail=True))
System requirements from `docs/source/user-guide/gpu-support.md:9-13`:
- NVIDIA Volta or higher GPU with compute capability 7.0+
- CUDA 12 (CUDA 11 support ends with RAPIDS v25.06)
- Linux or Windows Subsystem for Linux 2 (WSL2)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `PerformanceWarning: Query execution with GPU not supported` | Unsupported operation in query | Check verbose output; operation falls back to CPU |
| `ComputeError: 'cuda' conversion failed: NotImplementedError` | Operation not implemented on GPU with `raise_on_fail=True` | Remove `raise_on_fail=True` for CPU fallback, or restructure query |
| `CUDA out of memory` | Dataset too large for GPU VRAM | Reduce dataset size or use CPU engine |
| `ImportError: cudf_polars not found` | GPU package not installed | `pip install polars[gpu]` |
Compatibility Notes
- Supported: LazyFrame API, SQL API, I/O from CSV/Parquet/NDJSON, numeric/logical/string/datetime types, aggregations, joins, filters, concatenation
- Not Supported: Eager DataFrame API, Streaming API, Date/Categorical/Enum/Time/Array/Binary/Object types, time series resampling, folds, UDFs, Excel/Database I/O
- Fallback: By default, unsupported queries transparently fall back to CPU; use `raise_on_fail=True` on `GPUEngine` to detect this
- Testing: GPU engine passes 99.2% of Polars unit tests with fallback enabled, 88.8% without fallback
- Multiprocessing: Avoid using Python multiprocessing with GPU engine as they compete for system resources