Environment:Turboderp org Exllamav2 Build Toolchain
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Build_System |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
C++/CUDA compilation toolchain required for building the ExLlamaV2 native extension, either at install time or via JIT compilation at first run.
Description
ExLlamaV2 includes a C++/CUDA native extension (`exllamav2_ext`) that implements performance-critical kernels for quantized matrix operations, attention, normalization, sampling, and tensor parallelism. The extension consists of 14 C++ source files and 25 CUDA kernel files.
The extension can be built in two ways:
- Pre-compiled wheel: Distributed via PyPI for common CUDA/PyTorch/OS combinations. No build tools needed at runtime.
- JIT compilation: If no pre-compiled wheel is available, the extension is compiled on first import using `torch.utils.cpp_extension.load()`. This requires a C++ compiler and CUDA toolkit.
The `EXLLAMA_NOCOMPILE` environment variable can skip precompilation during `pip install` to defer to JIT compilation.
Usage
This environment is required when installing from source or when no pre-built wheel matches your CUDA/PyTorch combination. If using a pre-built wheel, the build toolchain is not needed at runtime.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (GCC) or Windows (MSVC 2017-2022) | macOS not supported |
| C++ Compiler | GCC (Linux) or MSVC (Windows) | MSVC auto-detected across Community, Professional, Enterprise, BuildTools editions |
| CUDA Toolkit | Matching PyTorch CUDA version | 11.8, 12.1, 12.4, or 12.8 |
| Build System | `ninja` | Required for parallel JIT compilation |
Dependencies
System Packages
- `gcc` / `g++` (Linux) or MSVC 2017-2022 (Windows)
- CUDA Toolkit with `nvcc` compiler
- `ninja` build system
Python Packages
- `torch` >= 2.2.0 (provides `torch.utils.cpp_extension`)
- `setuptools`
- `wheel`
Credentials
The following environment variables control the build process:
- `EXLLAMA_NOCOMPILE`: Set to skip precompilation during pip install (defers to JIT)
- `EXLLAMA_VERBOSE`: Set to enable verbose compilation output
- `EXLLAMA_EXT_DEBUG`: Set to compile with debug flags (`-ftime-report`, `-DTORCH_USE_CUDA_DSA`)
- `TORCH_CUDA_ARCH_LIST`: Override auto-detected GPU compute capabilities for compilation
Quick Install
# Install build dependencies
pip install ninja setuptools wheel
# Install from source (triggers compilation)
pip install exllamav2 --no-binary exllamav2
# Or skip precompilation (JIT on first run)
EXLLAMA_NOCOMPILE=1 pip install exllamav2
Code Evidence
Extension loading with JIT fallback from `exllamav2/ext.py:106-117`:
try:
import exllamav2_ext
except ModuleNotFoundError:
build_jit = True
except ImportError as e:
if "undefined symbol" in str(e):
print("\"undefined symbol\" error here usually means you are attempting to load "
"a prebuilt extension wheel that was compiled against a different version "
"of PyTorch than the one you are you using.")
raise e
CUDA arch list auto-detection from `exllamav2/ext.py:19-48`:
def maybe_set_arch_list_env():
if os.environ.get('TORCH_CUDA_ARCH_LIST', None):
return
if not torch.version.cuda:
return
arch_list = []
for i in range(torch.cuda.device_count()):
capability = torch.cuda.get_device_capability(i)
supported_sm = [int(arch.split('_')[1])
for arch in torch.cuda.get_arch_list() if 'sm_' in arch]
...
os.environ["TORCH_CUDA_ARCH_LIST"] = ";".join(arch_list)
Windows MSVC detection from `exllamav2/ext.py:123-152`:
def find_msvc():
for year in ['2022', '2019', '2017']:
for edition in ['Community', 'Professional', 'Enterprise', 'BuildTools']:
for root_key in ['ProgramW6432', 'ProgramFiles(x86)']:
...
return None
Compilation flags from `exllamav2/ext.py:176-189`:
# Linux
extra_cflags = ["-Ofast"]
# Windows
extra_cflags = ["/Ox"]
# NVCC (both platforms)
extra_cuda_cflags = ["-lineinfo", "-O3"]
# ROCm
if torch.version.hip:
extra_cuda_cflags += ["-DHIPBLAS_USE_HIP_HALF"]
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `undefined symbol` on import | Prebuilt wheel compiled against different PyTorch version | Reinstall ExLlamaV2 matching your exact PyTorch version, or install from source |
| `ModuleNotFoundError: No module named 'exllamav2_ext'` + JIT build failure | Missing C++ compiler or CUDA toolkit | Install GCC (Linux) or MSVC (Windows), ensure CUDA toolkit is in PATH |
| `ninja: command not found` | Ninja build system not installed | `pip install ninja` |
| NVCC compilation error for unsupported architecture | GPU compute capability not in `TORCH_CUDA_ARCH_LIST` | Set `TORCH_CUDA_ARCH_LIST` manually or update CUDA toolkit |
Compatibility Notes
- Pre-built wheels: Available for CUDA 11.8/12.1/12.4/12.8 on Linux and Windows with Python 3.10-3.13 and PyTorch 2.2-2.9. Using a pre-built wheel eliminates the need for a build toolchain.
- ROCm builds: Add `-DHIPBLAS_USE_HIP_HALF` automatically. ROCm does not need `TORCH_CUDA_ARCH_LIST` auto-detection.
- Windows: MSVC auto-detection searches Visual Studio 2017-2022 installations. If cl.exe is not in PATH, the build system injects the compiler path automatically.
- JIT compilation: Triggered automatically on first import if no prebuilt module is found. Compiles 39 source files (14 C++ + 25 CUDA).