Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Environment:Neuml Txtai GPU Accelerator Detection

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Deep_Learning
Last Updated 2026-02-10 00:00 GMT

Overview

GPU acceleration environment supporting NVIDIA CUDA, Apple MPS, and Intel XPU with automatic detection and CPU fallback.

Description

txtai implements a multi-tier hardware acceleration detection system through the `Models` utility class. The system probes for available accelerator devices in priority order: CUDA (NVIDIA GPUs) → MPS (Apple Silicon) → XPU (Intel GPUs) → CPU fallback. Device selection is controlled via the `gpu` parameter in configuration. Quantization features (4-bit/8-bit via bitsandbytes) require CUDA specifically and are automatically disabled on non-CUDA platforms.

Usage

This environment applies whenever GPU acceleration is desired for model inference, vector encoding, ANN indexing, or model training. Set `gpu=True` in pipeline or embeddings configuration to enable. CUDA is required for quantized model training with bitsandbytes and PEFT/LoRA.

System Requirements

Category Requirement Notes
Hardware (NVIDIA) NVIDIA GPU with CUDA support CUDA drivers must be installed
Hardware (Apple) Apple Silicon (M1/M2/M3+) MPS backend via PyTorch
Hardware (Intel) Intel GPU with XPU support PyTorch XPU backend
Software PyTorch >= 2.4 with matching CUDA toolkit `torch.cuda.is_available()` must return True
Software (ONNX GPU) `onnxruntime-gpu` package For GPU-accelerated ONNX inference

Dependencies

System Packages

  • NVIDIA CUDA Toolkit (for CUDA GPUs)
  • NVIDIA cuDNN (for deep learning on CUDA)

Python Packages

  • `torch` >= 2.4 (with CUDA/MPS/XPU support compiled in)
  • `bitsandbytes` >= 0.42.0 (optional, for quantized ANN and training)
  • `onnxruntime-gpu` (optional, replaces `onnxruntime` for GPU ONNX inference)

Credentials

  • `PYTORCH_MPS_DISABLE`: Set to `1` to disable MPS device on Apple Silicon
  • `LLAMA_NO_METAL`: Set to `1` to disable llama.cpp Metal acceleration on macOS
  • `OMP_NUM_THREADS`: Set to `1` on macOS/Windows to prevent Faiss segfaults
  • `KMP_DUPLICATE_LIB_OK`: Set to `TRUE` on macOS/Windows to work around OMP Error #15

Quick Install

# Standard install (includes GPU support if CUDA available)
pip install txtai

# Install with quantization support (requires CUDA)
pip install txtai[ann]  # includes bitsandbytes

# CPU-only (explicit, smaller footprint)
pip install txtai torch==2.4.0+cpu -f https://download.pytorch.org/whl/torch

Code Evidence

Accelerator detection from `models/models.py:149-158`:

@staticmethod
def hasaccelerator():
    return torch.cuda.is_available() or Models.hasmpsdevice() or bool(Models.finddevice())

Device fallback chain from `models/models.py:128-136`:

return (
    deviceid
    if isinstance(deviceid, str)
    else (
        "cpu"
        if deviceid < 0
        else f"cuda:{deviceid}" if torch.cuda.is_available() else "mps" if Models.hasmpsdevice() else Models.finddevice()
    )
)

MPS detection with environment override from `models/models.py:160-169`:

@staticmethod
def hasmpsdevice():
    return os.environ.get("PYTORCH_MPS_DISABLE") != "1" and torch.backends.mps.is_available()

XPU fallback detection from `models/models.py:171-180`:

@staticmethod
def finddevice():
    return next((device for device in ["xpu"] if hasattr(torch, device) and getattr(torch, device).is_available()), None)

Quantization requires CUDA from `pipeline/train/hftrainer.py:253-254`:

# Clear quantization configuration if GPU is not available
quantization = quantization if torch.cuda.is_available() else None

macOS Faiss workaround from `ann/dense/faiss.py:10-15`:

if platform.system() == "Darwin" or os.name == "nt":
    os.environ["OMP_NUM_THREADS"] = os.environ.get("OMP_NUM_THREADS", "1")
    os.environ["KMP_DUPLICATE_LIB_OK"] = os.environ.get("KMP_DUPLICATE_LIB_OK", "TRUE")

Common Errors

Error Message Cause Solution
Segmentation fault on macOS Faiss OpenMP threading conflict Set `OMP_NUM_THREADS=1` and `KMP_DUPLICATE_LIB_OK=TRUE`
Quantization silently disabled No CUDA GPU available Install CUDA drivers or use a CUDA-capable machine
`CUDA out of memory` Insufficient GPU VRAM Reduce batch size or enable quantization
MPS errors on Apple Silicon Unsupported MPS operations Set `PYTORCH_MPS_DISABLE=1` to fall back to CPU

Compatibility Notes

  • NVIDIA CUDA: Primary GPU target. Required for quantization and bitsandbytes features.
  • Apple MPS: Supported for inference. Can be disabled via `PYTORCH_MPS_DISABLE=1`. Some operations may not be supported on MPS.
  • Intel XPU: Experimental support via PyTorch XPU backend.
  • CPU: Always available as fallback. All features work on CPU except quantized training.
  • Multi-GPU: Set `gpu="all"` in sentence-transformers backend to spawn one process per GPU.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment