Environment:Bitsandbytes foundation Bitsandbytes ROCm AMD Environment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Quantization |
| Last Updated | 2026-02-07 13:00 GMT |
Overview
AMD ROCm GPU environment requiring ROCm 6.1+ with HIP runtime, supporting 4-bit and 8-bit quantization with platform-specific blocksize defaults (128 vs NVIDIA's 64) due to 64-wide wavefronts.
Description
This environment provides GPU-accelerated quantization on AMD GPUs via the ROCm/HIP platform. Bitsandbytes detects ROCm through `torch.version.hip` and loads a ROCm-specific native library (libbitsandbytes_rocm{version}.so). A key difference from NVIDIA is the wavefront (warp) size: AMD GPUs typically use 64-wide wavefronts (vs NVIDIA's 32), which changes the default blocksize for 4-bit quantization from 64 to 128. The `rocminfo` utility is used at import time to detect GPU architecture and warp size.
Usage
Use this environment when running bitsandbytes on AMD Instinct or Radeon RX GPUs with ROCm support. All major workflows (4-bit inference, 8-bit inference, 8-bit optimizer training, FSDP QLoRA) are supported, though some behaviors differ from NVIDIA due to warp size and hipBLASLt availability.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux | ROCm only officially supports Linux |
| Hardware | AMD GPU with ROCm support | Instinct MI200/MI300 or supported Radeon RX GPUs |
| ROCm Version | >= 6.1 (recommended) | Versions < 6.1 produce a warning; hipBLASLt requires ROCm 6.1+ |
| System Utility | `rocminfo` command available | Required for GPU architecture and warp size detection |
Dependencies
System Packages
- ROCm 6.1+ toolkit (libamdhip64.so, libhipblas.so)
- `rocminfo` command-line utility
- `ROCM_PATH` defaults to `/opt/rocm` if not set
Python Packages
- `python` >= 3.10
- `torch` >= 2.3, < 3 (ROCm build of PyTorch)
- `numpy` >= 1.17
- `packaging` >= 20.9
Credentials
- `ROCM_PATH`: Optional. Path to ROCm installation (defaults to `/opt/rocm`).
- `BNB_CUDA_VERSION`: Must NOT be set on ROCm systems. Setting it raises a RuntimeError.
Quick Install
# Install PyTorch for ROCm first
pip install torch --index-url https://download.pytorch.org/whl/rocm6.2
# Install bitsandbytes (or compile from source for ROCm)
pip install bitsandbytes
# Verify
python -m bitsandbytes
Code Evidence
ROCm detection from `bitsandbytes/cextension.py:309-313`:
HIP_ENVIRONMENT = False
BNB_BACKEND = "CPU"
if torch.version.hip:
HIP_ENVIRONMENT = True
BNB_BACKEND = "ROCm"
Warp size detection from `bitsandbytes/cuda_specs.py:105-128`:
def get_rocm_warpsize() -> int:
"""Get ROCm warp size."""
try:
if torch.version.hip:
result = subprocess.run(["rocminfo"], capture_output=True, text=True)
match = re.search(r"Wavefront Size:\s+([0-9]{2})\(0x[0-9]{2}\)", result.stdout)
if match:
return int(match.group(1))
else:
# default to 64 to be safe
return 64
else:
# nvidia cards always use 32 warp size
return 32
except Exception as e:
logger.error(f"Could not detect ROCm warp size: {e}. Defaulting to 64.")
return 64
ROCm version warning from `bitsandbytes/diagnostics/cuda.py:149-155`:
hip_major, hip_minor = cuda_specs.cuda_version_tuple
if (hip_major, hip_minor) < (6, 1):
print_dedented("""
WARNING: bitsandbytes is fully supported only from ROCm 6.1.
""")
BNB_CUDA_VERSION conflict check from `bitsandbytes/cextension.py:37-41`:
if torch.version.hip:
raise RuntimeError(
f"BNB_CUDA_VERSION={override_value} detected for ROCm!! \n"
f"Clear the variable and retry: export BNB_CUDA_VERSION=\n"
)
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `BNB_CUDA_VERSION=X detected for ROCm!!` | BNB_CUDA_VERSION environment variable set on ROCm system | `export BNB_CUDA_VERSION=` to clear it |
| `WARNING: bitsandbytes is fully supported only from ROCm 6.1` | ROCm version too old | Upgrade to ROCm 6.1+ |
| `Could not detect ROCm warp size` | `rocminfo` not available or failed | Install ROCm toolkit; ensure `rocminfo` is in PATH |
| `Library not found: libbitsandbytes_rocm...` | ROCm binary not compiled for this version | Compile from source for your ROCm version |
Compatibility Notes
- Warp size: AMD GPUs use 64-wide wavefronts (NVIDIA uses 32). Bitsandbytes defaults to blocksize=128 for 4-bit quantization on ROCm vs blocksize=64 on NVIDIA.
- hipBLASLt: Only available with ROCm >= 6.1. Older versions compile with `NO_HIPBLASLT` flag.
- IMMA support: ROCm always reports `has_imma=True` regardless of GPU architecture (all ROCm GPUs support equivalent INT8 operations).
- macOS: ROCm is not supported on macOS.
- Default safety: If warp size detection fails, bitsandbytes defaults to 64 (wider warp) to prevent crashes, though this may be suboptimal on RDNA2/RDNA3 GPUs with actual warp size of 32.
Related Pages
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Linear4bit
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Quantize_4bit
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Linear4bit_Forward
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Linear8bitLt
- Implementation:Bitsandbytes_foundation_Bitsandbytes_MatMul8bitLt
- Implementation:Bitsandbytes_foundation_Bitsandbytes_Quantize_Blockwise