Implementation:Turboderp org Exllamav2 Build Wheels Release Torch Latest
| Knowledge Sources | |
|---|---|
| Domains | CI_CD, Build_System |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
A streamlined GitHub Actions CI/CD workflow that builds ExLlamaV2 Python wheel packages for only the latest PyTorch version (2.9.0) with CUDA 12.8.1 across 8 matrix configurations, providing a fast-turnaround build pipeline for the most current toolchain.
Description
This workflow, named "Build Wheels & Release latest torch", is a simplified companion to the main Build Wheels & Release workflow. While the main workflow covers 88 configurations spanning multiple CUDA versions, ROCm versions, and PyTorch versions, this workflow focuses exclusively on the latest supported stack: PyTorch 2.9.0 with CUDA 12.8.1.
Platform coverage:
- Ubuntu 22.04 (Linux) -- 4 configurations
- Windows Server 2022 -- 4 configurations
Python version coverage:
- Python 3.10, 3.11, 3.12, and 3.13 -- all four supported Python versions
Toolkit coverage:
- CUDA 12.8.1 only -- no older CUDA versions, no ROCm support
- PyTorch 2.9.0 only -- no older PyTorch versions
- CUDA architectures:
6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX(Pascal through Blackwell/Thor)
Key differences from the main workflow:
- 8 configurations vs 88 in the main workflow (approximately 10x fewer builds)
- No ROCm builds -- AMD GPU support is not included
- No sdist build -- only compiled wheels are produced
- No extra HF Spaces wheel -- no special-case configurations
- Single CUDA version -- only CUDA 12.8.1
- Single PyTorch version -- only PyTorch 2.9.0
- 349 lines vs 442 lines in the main workflow
Despite the reduced matrix, the build steps are identical in structure to the main workflow:
- Free Disk Space -- on Linux runners, reclaim disk by removing Android SDK, .NET, Haskell, and swap (uses
jlumbroso/free-disk-space@v1.3.1) - Checkout -- clone the repository via
actions/checkout@v4 - Version Extraction -- parse
exllamav2/version.pyusing PowerShell regex to extract the__version__string - VS Build Tools -- on Windows, install VS2022 BuildTools 17.9.7 via Chocolatey for C++ compilation compatibility
- Python Setup -- install
uvviaastral-sh/setup-uv@v5and configure the target Python version - Toolkit Installation -- install CUDA via manual Windows downloads or
Jimver/cuda-toolkit@v0.2.23on Linux. The ROCm build step is present in the workflow definition but is never triggered because no matrix entry sets arocmvalue. - Dependency Installation -- install PyTorch 2.9.0 from the appropriate CUDA index URL, plus build tools
- Wheel Build -- run
python -m build -n --wheelwith the build tag+cu128-torch2.9.0 - Release Upload -- if the
releaseinput is set to'1', upload wheels to GitHub Releases (usessvenstaro/upload-release-action@2.6.1)
The workflow retains CUDA installation steps for all four CUDA versions (11.8, 12.1, 12.4, 12.8) in its step definitions, inherited from the main workflow, but only the CUDA 12.8 installation step is ever executed because all matrix entries use CUDA 12.8.1. The workflow uses fail-fast: false and the default shell is PowerShell (pwsh).
Usage
Use this workflow for quick iteration builds when only the latest PyTorch and CUDA combination is needed. This is ideal for testing new code changes against the current toolchain without waiting for the full 88-configuration matrix to complete. For full release builds covering all supported platform combinations, use the main Build Wheels Release workflow instead.
Code Reference
Source Location
- Repository: Turboderp_org_Exllamav2
- File: .github/workflows/build-wheels-release_torch_latest.yml
- Lines: 1-350
Signature
name: Build Wheels & Release latest torch
on:
workflow_dispatch:
inputs:
release:
description: 'Release? 1 = yes, 0 = no'
default: '0'
required: true
type: string
permissions:
contents: write
jobs:
build_wheels:
name: ${{ matrix.os }} P${{ matrix.pyver }} C${{ matrix.cuda }} R${{ matrix.rocm }} T${{ matrix.torch }}
runs-on: ${{ matrix.os }}
strategy:
matrix:
include:
# Ubuntu 22.04 CUDA - Python 3.10-3.13 x CUDA 12.8.1 x Torch 2.9.0
- { artname: 'wheel', os: ubuntu-22.04, pyver: '3.10', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
- { artname: 'wheel', os: ubuntu-22.04, pyver: '3.11', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
- { artname: 'wheel', os: ubuntu-22.04, pyver: '3.12', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
- { artname: 'wheel', os: ubuntu-22.04, pyver: '3.13', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
# Windows 2022 CUDA - Python 3.10-3.13 x CUDA 12.8.1 x Torch 2.9.0
- { artname: 'wheel', os: windows-2022, pyver: '3.10', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
- { artname: 'wheel', os: windows-2022, pyver: '3.11', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
- { artname: 'wheel', os: windows-2022, pyver: '3.12', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
- { artname: 'wheel', os: windows-2022, pyver: '3.13', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
fail-fast: false
Import
# Triggered via GitHub Actions workflow_dispatch
# No import needed - this is a CI/CD pipeline
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| release | string | Yes | Whether to upload wheels to GitHub release. 1 = yes, 0 = no (default: '0') |
Outputs
| Name | Type | Description |
|---|---|---|
| wheels | .whl files | Built Python wheel packages with build tag +cu128-torch2.9.0 (e.g., exllamav2-0.2.7+cu128-torch2.9.0-cp313-cp313-linux_x86_64.whl)
|
| release | GitHub Release | Uploaded wheels tagged as v{version} (only when release='1' and version is parsed successfully)
|
Matrix Parameters
| Parameter | Type | Values | Description |
|---|---|---|---|
| artname | string | 'wheel' | Always 'wheel' (no sdist in this workflow) |
| os | string | 'ubuntu-22.04', 'windows-2022' | Runner operating system |
| pyver | string | '3.10', '3.11', '3.12', '3.13' | Python version |
| cuda | string | '12.8.1' | CUDA toolkit version (always 12.8.1) |
| rocm | string | Always empty (no ROCm builds) | |
| torch | string | '2.9.0' | PyTorch version (always 2.9.0) |
| cudaarch | string | '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' | CUDA architecture targets (always the full modern set) |
Build Matrix Summary
| OS | Python 3.10 | Python 3.11 | Python 3.12 | Python 3.13 |
|---|---|---|---|---|
| Ubuntu 22.04 | CUDA 12.8.1 / Torch 2.9.0 | CUDA 12.8.1 / Torch 2.9.0 | CUDA 12.8.1 / Torch 2.9.0 | CUDA 12.8.1 / Torch 2.9.0 |
| Windows 2022 | CUDA 12.8.1 / Torch 2.9.0 | CUDA 12.8.1 / Torch 2.9.0 | CUDA 12.8.1 / Torch 2.9.0 | CUDA 12.8.1 / Torch 2.9.0 |
Build Steps Detail
Step 1: Free Disk Space
On Linux runners only, the jlumbroso/free-disk-space@v1.3.1 action removes the Android SDK, .NET SDK, Haskell toolchain, and swap storage. The large-packages option is set to false.
Step 2: Version Extraction
A PowerShell script reads exllamav2/version.py and extracts the version string using the regex pattern __version__ = "(\d+\.(?:\d+\.?(?:dev\d+)?)*)". The version is stored as the PACKAGE_VERSION step output for use in the release tag.
Step 3: Platform Toolchain Setup
- Windows: VS2022 BuildTools 17.9.7 installed via Chocolatey
- Windows CUDA 12.8: CUDA redistributable archives (cudart, nvcc, nvrtc, cublas, nvtx, profiler_api, visual_studio_integration, nvprof, cccl, cusparse, cusolver, curand, cufft) are downloaded from
developer.download.nvidia.comand extracted into the CUDA toolkit directory - Linux CUDA: The
Jimver/cuda-toolkit@v0.2.23action installs CUDA 12.8.1 via the network method
Step 4: Wheel Build
The wheel is built via python -m build -n --wheel with a build tag of +cu128-torch2.9.0 appended via egg_info. On Windows, the VS Developer Shell is spawned first and DISTUTILS_USE_SDK=1 is set. The TORCH_CUDA_ARCH_LIST targets all supported GPU architectures from Pascal (6.0) through Blackwell/Thor (12.0+PTX).
Step 5: Release Upload
When release='1', the svenstaro/upload-release-action@2.6.1 uploads all .whl files from ./dist/ to a GitHub Release tagged as v{PACKAGE_VERSION} with overwrite: true.
Usage Examples
Trigger Build Without Release
# Quick build for latest torch only (no upload)
gh workflow run "Build Wheels & Release latest torch" --ref main -f release=0
Trigger Build With Release
# Build and upload latest-torch wheels to GitHub Releases
gh workflow run "Build Wheels & Release latest torch" --ref main -f release=1
Monitor Build Progress
# List recent workflow runs
gh run list --workflow="Build Wheels & Release latest torch" --limit 5
# Watch a specific run
gh run watch <run-id>
Dependencies
GitHub Actions Used
| Action | Version | Purpose |
|---|---|---|
jlumbroso/free-disk-space |
v1.3.1 | Reclaim disk space on Linux runners |
actions/checkout |
v4 | Clone the repository |
astral-sh/setup-uv |
v5 | Install uv package manager and configure Python |
Jimver/cuda-toolkit |
v0.2.23 | Install CUDA toolkit on Linux |
svenstaro/upload-release-action |
v2.6.1 | Upload wheel artifacts to GitHub Releases |
Python Build Dependencies
torch==2.9.0(from PyTorch cu128 index)buildsetuptools==69.5.1(pinned)wheelpackagingninjasafetensorstokenizersnumpy