Environment:Neuml Txtai GPU Accelerator Detection

Knowledge Sources	txtai txtai FAQ
Domains	Infrastructure, Deep_Learning
Last Updated	2026-02-10 00:00 GMT

Overview

GPU acceleration environment supporting NVIDIA CUDA, Apple MPS, and Intel XPU with automatic detection and CPU fallback.

Description

txtai implements a multi-tier hardware acceleration detection system through the `Models` utility class. The system probes for available accelerator devices in priority order: CUDA (NVIDIA GPUs) → MPS (Apple Silicon) → XPU (Intel GPUs) → CPU fallback. Device selection is controlled via the `gpu` parameter in configuration. Quantization features (4-bit/8-bit via bitsandbytes) require CUDA specifically and are automatically disabled on non-CUDA platforms.

Usage

This environment applies whenever GPU acceleration is desired for model inference, vector encoding, ANN indexing, or model training. Set `gpu=True` in pipeline or embeddings configuration to enable. CUDA is required for quantized model training with bitsandbytes and PEFT/LoRA.

System Requirements

Category	Requirement	Notes
Hardware (NVIDIA)	NVIDIA GPU with CUDA support	CUDA drivers must be installed
Hardware (Apple)	Apple Silicon (M1/M2/M3+)	MPS backend via PyTorch
Hardware (Intel)	Intel GPU with XPU support	PyTorch XPU backend
Software	PyTorch >= 2.4 with matching CUDA toolkit	`torch.cuda.is_available()` must return True
Software (ONNX GPU)	`onnxruntime-gpu` package	For GPU-accelerated ONNX inference

Dependencies

System Packages

NVIDIA CUDA Toolkit (for CUDA GPUs)
NVIDIA cuDNN (for deep learning on CUDA)

Python Packages

`torch` >= 2.4 (with CUDA/MPS/XPU support compiled in)
`bitsandbytes` >= 0.42.0 (optional, for quantized ANN and training)
`onnxruntime-gpu` (optional, replaces `onnxruntime` for GPU ONNX inference)

Credentials

`PYTORCH_MPS_DISABLE`: Set to `1` to disable MPS device on Apple Silicon
`LLAMA_NO_METAL`: Set to `1` to disable llama.cpp Metal acceleration on macOS
`OMP_NUM_THREADS`: Set to `1` on macOS/Windows to prevent Faiss segfaults
`KMP_DUPLICATE_LIB_OK`: Set to `TRUE` on macOS/Windows to work around OMP Error #15

Quick Install

# Standard install (includes GPU support if CUDA available)
pip install txtai

# Install with quantization support (requires CUDA)
pip install txtai[ann]  # includes bitsandbytes

# CPU-only (explicit, smaller footprint)
pip install txtai torch==2.4.0+cpu -f https://download.pytorch.org/whl/torch

Code Evidence

Accelerator detection from `models/models.py:149-158`:

@staticmethod
def hasaccelerator():
    return torch.cuda.is_available() or Models.hasmpsdevice() or bool(Models.finddevice())

Device fallback chain from `models/models.py:128-136`:

return (
    deviceid
    if isinstance(deviceid, str)
    else (
        "cpu"
        if deviceid < 0
        else f"cuda:{deviceid}" if torch.cuda.is_available() else "mps" if Models.hasmpsdevice() else Models.finddevice()
    )
)

MPS detection with environment override from `models/models.py:160-169`:

@staticmethod
def hasmpsdevice():
    return os.environ.get("PYTORCH_MPS_DISABLE") != "1" and torch.backends.mps.is_available()

XPU fallback detection from `models/models.py:171-180`:

@staticmethod
def finddevice():
    return next((device for device in ["xpu"] if hasattr(torch, device) and getattr(torch, device).is_available()), None)

Quantization requires CUDA from `pipeline/train/hftrainer.py:253-254`:

# Clear quantization configuration if GPU is not available
quantization = quantization if torch.cuda.is_available() else None

macOS Faiss workaround from `ann/dense/faiss.py:10-15`:

if platform.system() == "Darwin" or os.name == "nt":
    os.environ["OMP_NUM_THREADS"] = os.environ.get("OMP_NUM_THREADS", "1")
    os.environ["KMP_DUPLICATE_LIB_OK"] = os.environ.get("KMP_DUPLICATE_LIB_OK", "TRUE")

Common Errors

Error Message	Cause	Solution
Segmentation fault on macOS	Faiss OpenMP threading conflict	Set `OMP_NUM_THREADS=1` and `KMP_DUPLICATE_LIB_OK=TRUE`
Quantization silently disabled	No CUDA GPU available	Install CUDA drivers or use a CUDA-capable machine
`CUDA out of memory`	Insufficient GPU VRAM	Reduce batch size or enable quantization
MPS errors on Apple Silicon	Unsupported MPS operations	Set `PYTORCH_MPS_DISABLE=1` to fall back to CPU

Compatibility Notes

NVIDIA CUDA: Primary GPU target. Required for quantization and bitsandbytes features.
Apple MPS: Supported for inference. Can be disabled via `PYTORCH_MPS_DISABLE=1`. Some operations may not be supported on MPS.
Intel XPU: Experimental support via PyTorch XPU backend.
CPU: Always available as fallback. All features work on CPU except quantized training.
Multi-GPU: Set `gpu="all"` in sentence-transformers backend to spawn one process per GPU.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment