Environment:Neuml Txtai GPU Accelerator Detection
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Deep_Learning |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
GPU acceleration environment supporting NVIDIA CUDA, Apple MPS, and Intel XPU with automatic detection and CPU fallback.
Description
txtai implements a multi-tier hardware acceleration detection system through the `Models` utility class. The system probes for available accelerator devices in priority order: CUDA (NVIDIA GPUs) → MPS (Apple Silicon) → XPU (Intel GPUs) → CPU fallback. Device selection is controlled via the `gpu` parameter in configuration. Quantization features (4-bit/8-bit via bitsandbytes) require CUDA specifically and are automatically disabled on non-CUDA platforms.
Usage
This environment applies whenever GPU acceleration is desired for model inference, vector encoding, ANN indexing, or model training. Set `gpu=True` in pipeline or embeddings configuration to enable. CUDA is required for quantized model training with bitsandbytes and PEFT/LoRA.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Hardware (NVIDIA) | NVIDIA GPU with CUDA support | CUDA drivers must be installed |
| Hardware (Apple) | Apple Silicon (M1/M2/M3+) | MPS backend via PyTorch |
| Hardware (Intel) | Intel GPU with XPU support | PyTorch XPU backend |
| Software | PyTorch >= 2.4 with matching CUDA toolkit | `torch.cuda.is_available()` must return True |
| Software (ONNX GPU) | `onnxruntime-gpu` package | For GPU-accelerated ONNX inference |
Dependencies
System Packages
- NVIDIA CUDA Toolkit (for CUDA GPUs)
- NVIDIA cuDNN (for deep learning on CUDA)
Python Packages
- `torch` >= 2.4 (with CUDA/MPS/XPU support compiled in)
- `bitsandbytes` >= 0.42.0 (optional, for quantized ANN and training)
- `onnxruntime-gpu` (optional, replaces `onnxruntime` for GPU ONNX inference)
Credentials
- `PYTORCH_MPS_DISABLE`: Set to `1` to disable MPS device on Apple Silicon
- `LLAMA_NO_METAL`: Set to `1` to disable llama.cpp Metal acceleration on macOS
- `OMP_NUM_THREADS`: Set to `1` on macOS/Windows to prevent Faiss segfaults
- `KMP_DUPLICATE_LIB_OK`: Set to `TRUE` on macOS/Windows to work around OMP Error #15
Quick Install
# Standard install (includes GPU support if CUDA available)
pip install txtai
# Install with quantization support (requires CUDA)
pip install txtai[ann] # includes bitsandbytes
# CPU-only (explicit, smaller footprint)
pip install txtai torch==2.4.0+cpu -f https://download.pytorch.org/whl/torch
Code Evidence
Accelerator detection from `models/models.py:149-158`:
@staticmethod
def hasaccelerator():
return torch.cuda.is_available() or Models.hasmpsdevice() or bool(Models.finddevice())
Device fallback chain from `models/models.py:128-136`:
return (
deviceid
if isinstance(deviceid, str)
else (
"cpu"
if deviceid < 0
else f"cuda:{deviceid}" if torch.cuda.is_available() else "mps" if Models.hasmpsdevice() else Models.finddevice()
)
)
MPS detection with environment override from `models/models.py:160-169`:
@staticmethod
def hasmpsdevice():
return os.environ.get("PYTORCH_MPS_DISABLE") != "1" and torch.backends.mps.is_available()
XPU fallback detection from `models/models.py:171-180`:
@staticmethod
def finddevice():
return next((device for device in ["xpu"] if hasattr(torch, device) and getattr(torch, device).is_available()), None)
Quantization requires CUDA from `pipeline/train/hftrainer.py:253-254`:
# Clear quantization configuration if GPU is not available
quantization = quantization if torch.cuda.is_available() else None
macOS Faiss workaround from `ann/dense/faiss.py:10-15`:
if platform.system() == "Darwin" or os.name == "nt":
os.environ["OMP_NUM_THREADS"] = os.environ.get("OMP_NUM_THREADS", "1")
os.environ["KMP_DUPLICATE_LIB_OK"] = os.environ.get("KMP_DUPLICATE_LIB_OK", "TRUE")
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| Segmentation fault on macOS | Faiss OpenMP threading conflict | Set `OMP_NUM_THREADS=1` and `KMP_DUPLICATE_LIB_OK=TRUE` |
| Quantization silently disabled | No CUDA GPU available | Install CUDA drivers or use a CUDA-capable machine |
| `CUDA out of memory` | Insufficient GPU VRAM | Reduce batch size or enable quantization |
| MPS errors on Apple Silicon | Unsupported MPS operations | Set `PYTORCH_MPS_DISABLE=1` to fall back to CPU |
Compatibility Notes
- NVIDIA CUDA: Primary GPU target. Required for quantization and bitsandbytes features.
- Apple MPS: Supported for inference. Can be disabled via `PYTORCH_MPS_DISABLE=1`. Some operations may not be supported on MPS.
- Intel XPU: Experimental support via PyTorch XPU backend.
- CPU: Always available as fallback. All features work on CPU except quantized training.
- Multi-GPU: Set `gpu="all"` in sentence-transformers backend to spawn one process per GPU.