Environment:Neuml Txtai GPU Accelerator Environment

Knowledge Sources	txtai txtai Install Guide txtai FAQ
Domains	Infrastructure, GPU_Acceleration, Deep_Learning
Last Updated	2026-02-09 17:00 GMT

Overview

GPU-accelerated environment supporting NVIDIA CUDA, Apple Metal (MPS), and Intel XPU devices for hardware-accelerated embedding generation, model inference, and ANN index operations.

Description

This environment extends the core Python environment with GPU acceleration. txtai implements a multi-tier device detection strategy: it first checks for NVIDIA CUDA, then Apple Metal Performance Shaders (MPS), then Intel XPU. When no accelerator is detected, all operations fall back to CPU transparently. GPU acceleration significantly improves performance for embedding generation, model inference (LLM pipelines), and certain ANN backends (torch, GGML). The `bitsandbytes` library enables int8 and 4-bit quantization on CUDA devices to reduce VRAM requirements.

Usage

Use this environment when running model inference or training at scale. GPU acceleration is recommended for embedding generation on large document sets, LLM text generation, model fine-tuning via HFTrainer, and GPU-backed ANN indexes (torch, GGML backends). CPU fallback is automatic but substantially slower for these operations.

System Requirements

Category	Requirement	Notes
OS	Linux (recommended), macOS, Windows	Linux for CUDA; macOS for MPS
Hardware (CUDA)	NVIDIA GPU with CUDA support	Any modern NVIDIA GPU; VRAM depends on model size
Hardware (MPS)	Apple Silicon (M1/M2/M3/M4)	macOS 12.3+ required for MPS support
Hardware (XPU)	Intel Arc / Data Center GPU	Requires PyTorch >= 2.4 with XPU support
CUDA Toolkit	Compatible with PyTorch version	Typically CUDA 11.8 or 12.x

Dependencies

System Packages

NVIDIA CUDA Toolkit (for CUDA GPUs)
NVIDIA cuDNN (for CUDA GPUs)

Python Packages

`torch` >= 2.4 (with CUDA/MPS/XPU support compiled)
`bitsandbytes` >= 0.42.0 (optional, for int8/4-bit quantization on CUDA)
`ggml-py` >= 0.9.4 (optional, for GGML GPU-accelerated ANN backend)

Credentials

The following environment variables control GPU behavior:

`PYTORCH_MPS_DISABLE`: Set to `1` to disable Apple MPS device even when available
`LLAMA_NO_METAL`: Set to `1` to disable llama.cpp Metal acceleration on macOS
`OMP_NUM_THREADS`: Set to `1` on macOS/Windows to prevent Faiss segmentation faults
`KMP_DUPLICATE_LIB_OK`: Set to `TRUE` on macOS/Windows to work around OMP Error #15
`CUDA_VISIBLE_DEVICES`: Standard NVIDIA env var to restrict which GPUs are visible

Quick Install

# GPU-enabled PyTorch (CUDA 12.x)
pip install torch torchvision

# With quantization support
pip install txtai[ann]  # includes bitsandbytes

# CPU-only (explicitly, for smaller installs)
pip install torch==2.10.0+cpu -f https://download.pytorch.org/whl/torch

Code Evidence

Multi-tier accelerator detection from `src/python/txtai/models/models.py:150-180`:

@staticmethod
def hasaccelerator():
    return torch.cuda.is_available() or Models.hasmpsdevice() or bool(Models.finddevice())

@staticmethod
def hasmpsdevice():
    return os.environ.get("PYTORCH_MPS_DISABLE") != "1" and torch.backends.mps.is_available()

@staticmethod
def finddevice():
    return next((device for device in ["xpu"] if hasattr(torch, device) and getattr(torch, device).is_available()), None)

Device reference resolution from `src/python/txtai/models/models.py:128-136`:

return (
    deviceid
    if isinstance(deviceid, str)
    else (
        "cpu"
        if deviceid < 0
        else f"cuda:{deviceid}" if torch.cuda.is_available() else "mps" if Models.hasmpsdevice() else Models.finddevice()
    )
)

macOS/Windows OpenMP workaround from `src/python/txtai/ann/dense/faiss.py:10-15`:

if platform.system() == "Darwin" or os.name == "nt":
    # Workaround for a Faiss issue causing segmentation faults
    os.environ["OMP_NUM_THREADS"] = os.environ.get("OMP_NUM_THREADS", "1")
    # Workaround for a Faiss issue with OMP: Error #15
    os.environ["KMP_DUPLICATE_LIB_OK"] = os.environ.get("KMP_DUPLICATE_LIB_OK", "TRUE")

Common Errors

Error Message	Cause	Solution
`CUDA out of memory`	Insufficient GPU VRAM for model	Reduce batch size, enable quantization, or use smaller model
Faiss segmentation fault on macOS	OpenMP threading conflict	Set `OMP_NUM_THREADS=1` before import
`OMP: Error #15: Initializing libiomp5`	Duplicate OpenMP libraries	Set `KMP_DUPLICATE_LIB_OK=TRUE`
`MPS backend out of memory`	Insufficient Apple Silicon memory	Set `PYTORCH_MPS_DISABLE=1` to fall back to CPU
`bitsandbytes is not available`	Missing quantization library	`pip install txtai[ann]` or `pip install bitsandbytes`

Compatibility Notes

NVIDIA CUDA: Primary accelerator; full support for all GPU operations including quantization
Apple MPS: Supported for model inference and embedding generation; can be disabled via `PYTORCH_MPS_DISABLE=1`
Intel XPU: Requires PyTorch >= 2.4 with XPU support; detected as fallback after CUDA and MPS
Quantization: `bitsandbytes` int8/4-bit quantization only works on CUDA devices, not MPS or XPU
llama.cpp Metal: Enabled by default on macOS; disable with `LLAMA_NO_METAL=1` if unstable
Multi-GPU: Sentence-transformers multiprocessing pooling is only enabled when `device_count > 1`

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment