Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Turboderp org Exllamav2 Build Wheels Release Torch Latest

From Leeroopedia
Knowledge Sources
Domains CI_CD, Build_System
Last Updated 2026-02-15 00:00 GMT

Overview

A streamlined GitHub Actions CI/CD workflow that builds ExLlamaV2 Python wheel packages for only the latest PyTorch version (2.9.0) with CUDA 12.8.1 across 8 matrix configurations, providing a fast-turnaround build pipeline for the most current toolchain.

Description

This workflow, named "Build Wheels & Release latest torch", is a simplified companion to the main Build Wheels & Release workflow. While the main workflow covers 88 configurations spanning multiple CUDA versions, ROCm versions, and PyTorch versions, this workflow focuses exclusively on the latest supported stack: PyTorch 2.9.0 with CUDA 12.8.1.

Platform coverage:

  • Ubuntu 22.04 (Linux) -- 4 configurations
  • Windows Server 2022 -- 4 configurations

Python version coverage:

  • Python 3.10, 3.11, 3.12, and 3.13 -- all four supported Python versions

Toolkit coverage:

  • CUDA 12.8.1 only -- no older CUDA versions, no ROCm support
  • PyTorch 2.9.0 only -- no older PyTorch versions
  • CUDA architectures: 6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX (Pascal through Blackwell/Thor)

Key differences from the main workflow:

  • 8 configurations vs 88 in the main workflow (approximately 10x fewer builds)
  • No ROCm builds -- AMD GPU support is not included
  • No sdist build -- only compiled wheels are produced
  • No extra HF Spaces wheel -- no special-case configurations
  • Single CUDA version -- only CUDA 12.8.1
  • Single PyTorch version -- only PyTorch 2.9.0
  • 349 lines vs 442 lines in the main workflow

Despite the reduced matrix, the build steps are identical in structure to the main workflow:

  1. Free Disk Space -- on Linux runners, reclaim disk by removing Android SDK, .NET, Haskell, and swap (uses jlumbroso/free-disk-space@v1.3.1)
  2. Checkout -- clone the repository via actions/checkout@v4
  3. Version Extraction -- parse exllamav2/version.py using PowerShell regex to extract the __version__ string
  4. VS Build Tools -- on Windows, install VS2022 BuildTools 17.9.7 via Chocolatey for C++ compilation compatibility
  5. Python Setup -- install uv via astral-sh/setup-uv@v5 and configure the target Python version
  6. Toolkit Installation -- install CUDA via manual Windows downloads or Jimver/cuda-toolkit@v0.2.23 on Linux. The ROCm build step is present in the workflow definition but is never triggered because no matrix entry sets a rocm value.
  7. Dependency Installation -- install PyTorch 2.9.0 from the appropriate CUDA index URL, plus build tools
  8. Wheel Build -- run python -m build -n --wheel with the build tag +cu128-torch2.9.0
  9. Release Upload -- if the release input is set to '1', upload wheels to GitHub Releases (uses svenstaro/upload-release-action@2.6.1)

The workflow retains CUDA installation steps for all four CUDA versions (11.8, 12.1, 12.4, 12.8) in its step definitions, inherited from the main workflow, but only the CUDA 12.8 installation step is ever executed because all matrix entries use CUDA 12.8.1. The workflow uses fail-fast: false and the default shell is PowerShell (pwsh).

Usage

Use this workflow for quick iteration builds when only the latest PyTorch and CUDA combination is needed. This is ideal for testing new code changes against the current toolchain without waiting for the full 88-configuration matrix to complete. For full release builds covering all supported platform combinations, use the main Build Wheels Release workflow instead.

Code Reference

Source Location

Signature

name: Build Wheels & Release latest torch

on:
  workflow_dispatch:
    inputs:
      release:
        description: 'Release? 1 = yes, 0 = no'
        default: '0'
        required: true
        type: string

permissions:
  contents: write

jobs:
  build_wheels:
    name: ${{ matrix.os }} P${{ matrix.pyver }} C${{ matrix.cuda }} R${{ matrix.rocm }} T${{ matrix.torch }}
    runs-on: ${{ matrix.os }}
    strategy:
      matrix:
        include:
          # Ubuntu 22.04 CUDA - Python 3.10-3.13 x CUDA 12.8.1 x Torch 2.9.0
          - { artname: 'wheel', os: ubuntu-22.04, pyver: '3.10', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
          - { artname: 'wheel', os: ubuntu-22.04, pyver: '3.11', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
          - { artname: 'wheel', os: ubuntu-22.04, pyver: '3.12', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
          - { artname: 'wheel', os: ubuntu-22.04, pyver: '3.13', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
          # Windows 2022 CUDA - Python 3.10-3.13 x CUDA 12.8.1 x Torch 2.9.0
          - { artname: 'wheel', os: windows-2022, pyver: '3.10', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
          - { artname: 'wheel', os: windows-2022, pyver: '3.11', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
          - { artname: 'wheel', os: windows-2022, pyver: '3.12', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
          - { artname: 'wheel', os: windows-2022, pyver: '3.13', cuda: '12.8.1', rocm: '', torch: '2.9.0', cudaarch: '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' }
      fail-fast: false

Import

# Triggered via GitHub Actions workflow_dispatch
# No import needed - this is a CI/CD pipeline

I/O Contract

Inputs

Name Type Required Description
release string Yes Whether to upload wheels to GitHub release. 1 = yes, 0 = no (default: '0')

Outputs

Name Type Description
wheels .whl files Built Python wheel packages with build tag +cu128-torch2.9.0 (e.g., exllamav2-0.2.7+cu128-torch2.9.0-cp313-cp313-linux_x86_64.whl)
release GitHub Release Uploaded wheels tagged as v{version} (only when release='1' and version is parsed successfully)

Matrix Parameters

Parameter Type Values Description
artname string 'wheel' Always 'wheel' (no sdist in this workflow)
os string 'ubuntu-22.04', 'windows-2022' Runner operating system
pyver string '3.10', '3.11', '3.12', '3.13' Python version
cuda string '12.8.1' CUDA toolkit version (always 12.8.1)
rocm string Always empty (no ROCm builds)
torch string '2.9.0' PyTorch version (always 2.9.0)
cudaarch string '6.0 6.1 7.0 7.5 8.0 8.6 8.9 9.0 10.0 12.0+PTX' CUDA architecture targets (always the full modern set)

Build Matrix Summary

OS Python 3.10 Python 3.11 Python 3.12 Python 3.13
Ubuntu 22.04 CUDA 12.8.1 / Torch 2.9.0 CUDA 12.8.1 / Torch 2.9.0 CUDA 12.8.1 / Torch 2.9.0 CUDA 12.8.1 / Torch 2.9.0
Windows 2022 CUDA 12.8.1 / Torch 2.9.0 CUDA 12.8.1 / Torch 2.9.0 CUDA 12.8.1 / Torch 2.9.0 CUDA 12.8.1 / Torch 2.9.0

Build Steps Detail

Step 1: Free Disk Space

On Linux runners only, the jlumbroso/free-disk-space@v1.3.1 action removes the Android SDK, .NET SDK, Haskell toolchain, and swap storage. The large-packages option is set to false.

Step 2: Version Extraction

A PowerShell script reads exllamav2/version.py and extracts the version string using the regex pattern __version__ = "(\d+\.(?:\d+\.?(?:dev\d+)?)*)". The version is stored as the PACKAGE_VERSION step output for use in the release tag.

Step 3: Platform Toolchain Setup

  • Windows: VS2022 BuildTools 17.9.7 installed via Chocolatey
  • Windows CUDA 12.8: CUDA redistributable archives (cudart, nvcc, nvrtc, cublas, nvtx, profiler_api, visual_studio_integration, nvprof, cccl, cusparse, cusolver, curand, cufft) are downloaded from developer.download.nvidia.com and extracted into the CUDA toolkit directory
  • Linux CUDA: The Jimver/cuda-toolkit@v0.2.23 action installs CUDA 12.8.1 via the network method

Step 4: Wheel Build

The wheel is built via python -m build -n --wheel with a build tag of +cu128-torch2.9.0 appended via egg_info. On Windows, the VS Developer Shell is spawned first and DISTUTILS_USE_SDK=1 is set. The TORCH_CUDA_ARCH_LIST targets all supported GPU architectures from Pascal (6.0) through Blackwell/Thor (12.0+PTX).

Step 5: Release Upload

When release='1', the svenstaro/upload-release-action@2.6.1 uploads all .whl files from ./dist/ to a GitHub Release tagged as v{PACKAGE_VERSION} with overwrite: true.

Usage Examples

Trigger Build Without Release

# Quick build for latest torch only (no upload)
gh workflow run "Build Wheels & Release latest torch" --ref main -f release=0

Trigger Build With Release

# Build and upload latest-torch wheels to GitHub Releases
gh workflow run "Build Wheels & Release latest torch" --ref main -f release=1

Monitor Build Progress

# List recent workflow runs
gh run list --workflow="Build Wheels & Release latest torch" --limit 5

# Watch a specific run
gh run watch <run-id>

Dependencies

GitHub Actions Used

Action Version Purpose
jlumbroso/free-disk-space v1.3.1 Reclaim disk space on Linux runners
actions/checkout v4 Clone the repository
astral-sh/setup-uv v5 Install uv package manager and configure Python
Jimver/cuda-toolkit v0.2.23 Install CUDA toolkit on Linux
svenstaro/upload-release-action v2.6.1 Upload wheel artifacts to GitHub Releases

Python Build Dependencies

  • torch==2.9.0 (from PyTorch cu128 index)
  • build
  • setuptools==69.5.1 (pinned)
  • wheel
  • packaging
  • ninja
  • safetensors
  • tokenizers
  • numpy

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment