Environment:Bitsandbytes foundation Bitsandbytes CUDA GPU Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Quantization |
| Last Updated | 2026-02-07 13:00 GMT |
Overview
NVIDIA CUDA GPU runtime environment requiring compute capability >= 7.5 for INT8 tensor cores, CUDA 11.8 to 13.x, PyTorch >= 2.3, and Python >= 3.10.
Description
This environment provides the primary GPU-accelerated context for running bitsandbytes quantized operations on NVIDIA hardware. It requires a CUDA-capable GPU with the matching native library (libbitsandbytes_cuda{version}.so) loaded at runtime. The library is built for specific CUDA versions (11.8 through 13.x) and GPU compute capabilities ranging from SM50 (Maxwell) through SM121 (Blackwell). INT8 tensor core operations (IMMA) require compute capability >= 7.5 (Turing or newer). The library loading system auto-detects the CUDA version from PyTorch and selects the appropriate pre-compiled binary.
Usage
Use this environment for all GPU-accelerated bitsandbytes operations including 4-bit quantized inference (Linear4bit), 8-bit LLM.int8() inference (Linear8bitLt), 8-bit optimizer training, and FSDP QLoRA distributed training. This is the default and recommended environment for production use.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (Ubuntu recommended), Windows, macOS (MPS only) | Linux is the primary supported platform |
| Hardware | NVIDIA GPU with CUDA support | Compute capability >= 7.5 required for INT8 tensor cores (Turing+) |
| GPU VRAM | Minimum 4GB | Higher VRAM needed for larger models; 16GB+ recommended for 7B+ models |
| CUDA Version | 11.8 to 13.x | Pre-compiled binaries shipped for these versions; build from source for others |
| Compute Capability | >= 5.0 (minimum), >= 7.5 (recommended) | SM50-SM121 supported; SM75+ enables INT8 IMMA |
Dependencies
System Packages
- CUDA Toolkit 11.8 to 13.x (runtime libraries: libcudart.so, libcublas.so, libcublasLt.so)
- CUDA runtime must be in LD_LIBRARY_PATH or CONDA_PREFIX
Python Packages
- `python` >= 3.10
- `torch` >= 2.3, < 3
- `numpy` >= 1.17
- `packaging` >= 20.9
Credentials
The following environment variables are relevant at runtime:
- `BNB_CUDA_VERSION`: Optional override to force loading a specific CUDA version binary (e.g., "118" for CUDA 11.8). Use with caution — version mismatch can cause subtle bugs.
- `LD_LIBRARY_PATH`: Must include the path to CUDA runtime libraries if not using conda.
- `CONDA_PREFIX`: Used as an alternative path for finding CUDA runtime libraries.
Quick Install
# Install bitsandbytes with CUDA support (pre-compiled wheels)
pip install bitsandbytes
# Verify installation
python -m bitsandbytes
Code Evidence
CUDA availability check from `bitsandbytes/cuda_specs.py:54-57`:
def get_cuda_specs() -> Optional[CUDASpecs]:
"""Get CUDA/HIP specifications."""
if not torch.cuda.is_available():
return None
Compute capability check for INT8 IMMA support from `bitsandbytes/cuda_specs.py:17-19`:
@property
def has_imma(self) -> bool:
return torch.version.hip or self.highest_compute_capability >= (7, 5)
Diagnostic warning for low compute capability from `bitsandbytes/diagnostics/cuda.py:124-132`:
# 7.5 is the minimum CC for int8 tensor cores
if not cuda_specs.has_imma:
print_dedented("""
WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
If you run into issues with 8-bit matmul, you can try 4-bit quantization:
https://huggingface.co/blog/4bit-transformers-bitsandbytes
""")
Backend detection and library loading from `bitsandbytes/cextension.py:309-317`:
HIP_ENVIRONMENT = False
BNB_BACKEND = "CPU"
if torch.version.hip:
HIP_ENVIRONMENT = True
BNB_BACKEND = "ROCm"
elif torch.cuda.is_available():
BNB_BACKEND = "CUDA"
elif torch._C._has_xpu:
BNB_BACKEND = "XPU"
BNB_CUDA_VERSION override from `bitsandbytes/cextension.py:34-46`:
override_value = os.environ.get("BNB_CUDA_VERSION")
if override_value:
library_name = re.sub(r"cuda\d+", f"cuda{override_value}", library_name, count=1)
if torch.version.hip:
raise RuntimeError(
f"BNB_CUDA_VERSION={override_value} detected for ROCm!! \n"
f"Clear the variable and retry: export BNB_CUDA_VERSION=\n"
)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Configured CUDA binary not found at ...` | Pre-compiled binary missing for detected CUDA version | Install matching CUDA version or set `BNB_CUDA_VERSION` to an available version |
| `cannot open shared object file: No such file or directory` | CUDA runtime libraries not in `LD_LIBRARY_PATH` | Add CUDA lib path: `export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/path/to/cuda/lib64` |
| `WARNING: Compute capability < 7.5 detected!` | GPU too old for INT8 tensor cores | Use 4-bit quantization instead of 8-bit; upgrade GPU to Turing (RTX 20xx) or newer |
| `Method 'X' not available in CPU-only version` | GPU library failed to load; fell back to CPU mock | Run `python -m bitsandbytes` for diagnostics; ensure CUDA is properly installed |
| `BNB_CUDA_VERSION=X detected for ROCm!!` | BNB_CUDA_VERSION set on AMD system | Clear the variable: `export BNB_CUDA_VERSION=` |
Compatibility Notes
- CUDA 11.8-12.x: Supports compute capabilities SM50 through SM90 (Maxwell through Hopper).
- CUDA 13.0+: Drops support for SM < 75 (pre-Turing). Adds SM100/SM103/SM110/SM120/SM121 (Blackwell).
- macOS: CUDA is not supported on macOS. Use MPS backend (CPU-only bitsandbytes library) instead.
- Windows: Supported via pre-compiled DLLs (libbitsandbytes_cuda{version}.dll).
- Multi-GPU: Supported with automatic device context switching. Single-GPU systems skip device switching overhead.
- torch.compile: PyTorch 2.4+ required for `register_fake`/`register_kernel`; PyTorch 2.3 uses legacy `impl_abstract`/`impl`.
Related Pages
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Linear4bit
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Linear4bit_Forward
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Quantize_4bit
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Linear8bitLt
- Implementation:Bitsandbytes_foundation_Bitsandbytes_MatMul8bitLt
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Int8_Vectorwise_Quant
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Optimizer8bit_Step
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Adam8bit
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Quantize_Blockwise
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Linear4bit_FSDP
- Implementation:Bitsandbytes_foundation_Bitsandbytes_PagedAdamW8bit
- Implementation:Bitsandbytes_foundation_Bitsandbytes_GlobalOptimManager
- Implementation:Bitsandbytes_foundation_Bitsandbytes_BitsAndBytesConfig_4bit
- Implementation:Bitsandbytes_foundation_Bitsandbytes_BitsAndBytesConfig_8bit
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Fix_4bit_Weight_Quant_State