Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Huggingface Peft Optional Quantization Backends

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Quantization
Last Updated 2026-02-07 06:44 GMT

Overview

Optional quantization backend packages (GPTQModel, TorchAO, AQLM, EETQ, HQQ, INC) that enable PEFT adapters on models quantized with various methods beyond bitsandbytes.

Description

PEFT supports attaching adapter layers (LoRA, OFT, etc.) to models quantized by several different quantization frameworks. Each backend is detected at import time via `importlib.util.find_spec()` and, when available, registers specialized layer classes. Some backends have strict minimum version requirements that are enforced at import time. All backends are optional and independent of each other.

Usage

Use these backends when:

  • You have a GPTQ-quantized model and want to add LoRA adapters (GPTQModel + optimum)
  • You want to use PyTorch Architecture Optimization quantization (TorchAO)
  • You have AQLM, EETQ, HQQ, or INC quantized models

System Requirements

Category Requirement Notes
Hardware GPU with CUDA support Most quantization backends require CUDA
Python >= 3.10 Same as core PEFT

Dependencies

GPTQModel

  • `gptqmodel` >= 5.6.12
  • `optimum` >= 1.24.0

TorchAO

  • `torchao` >= 0.4.0

AQLM

  • `aqlm` (any version)

EETQ

  • `eetq` (any version)

HQQ

  • `hqq` (any version)

INC (Intel Neural Compressor)

  • `neural_compressor` (any version)

Diffusers (for DreamBooth workflows)

  • `diffusers` (any version)

Credentials

No additional credentials required.

Quick Install

# GPTQModel quantization support
pip install gptqmodel>=5.6.12 optimum>=1.24.0

# TorchAO quantization support
pip install torchao>=0.4.0

# AQLM quantization support
pip install aqlm

# EETQ quantization support
pip install eetq

# HQQ quantization support
pip install hqq

# Intel Neural Compressor support
pip install neural-compressor

# Diffusers for DreamBooth LoRA
pip install diffusers

Code Evidence

GPTQModel version enforcement from `src/peft/import_utils.py:39-62`:

@lru_cache
def is_gptqmodel_available():
    if importlib.util.find_spec("gptqmodel") is not None:
        GPTQMODEL_MINIMUM_VERSION = packaging.version.parse("5.6.12")
        OPTIMUM_MINIMUM_VERSION = packaging.version.parse("1.24.0")
        version_gptqmodel = packaging.version.parse(
            importlib_metadata.version("gptqmodel")
        )
        if GPTQMODEL_MINIMUM_VERSION <= version_gptqmodel:
            if is_optimum_available():
                version_optimum = packaging.version.parse(
                    importlib_metadata.version("optimum")
                )
                if OPTIMUM_MINIMUM_VERSION <= version_optimum:
                    return True
                else:
                    raise ImportError(
                        f"gptqmodel requires optimum version "
                        f"`{OPTIMUM_MINIMUM_VERSION}` or higher."
                    )

TorchAO version enforcement from `src/peft/import_utils.py:108-128`:

@lru_cache
def is_torchao_available():
    if importlib.util.find_spec("torchao") is None:
        return False
    TORCHAO_MINIMUM_VERSION = packaging.version.parse("0.4.0")
    try:
        torchao_version = packaging.version.parse(
            importlib_metadata.version("torchao")
        )
    except importlib_metadata.PackageNotFoundError:
        return False
    if torchao_version < TORCHAO_MINIMUM_VERSION:
        raise ImportError(
            f"Found an incompatible version of torchao. "
            f"Found version {torchao_version}, "
            f"but only versions above {TORCHAO_MINIMUM_VERSION} are supported"
        )
    return True

Quantization method detection from `src/peft/utils/other.py:150-154`:

is_gptq_quantized = getattr(model, "quantization_method", None) == "gptq"
is_aqlm_quantized = getattr(model, "quantization_method", None) == "aqlm"
is_eetq_quantized = getattr(model, "quantization_method", None) == "eetq"
is_torchao_quantized = getattr(model, "quantization_method", None) == "torchao"
is_hqq_quantized = getattr(model, "quantization_method", None) == "hqq" or getattr(
    model, "hqq_quantized", False
)

Common Errors

Error Message Cause Solution
`Found an incompatible version of gptqmodel` GPTQModel < 5.6.12 `pip install gptqmodel>=5.6.12`
`gptqmodel requires optimum version 1.24.0 or higher` Optimum too old for GPTQModel `pip install optimum>=1.24.0`
`gptqmodel requires optimum ... to be installed` Optimum not installed `pip install optimum>=1.24.0`
`Found an incompatible version of torchao` TorchAO < 0.4.0 `pip install torchao>=0.4.0`

Compatibility Notes

  • GPTQModel requires both `gptqmodel` AND `optimum` packages with specific minimum versions.
  • TorchAO has an edge case where `find_spec("torchao")` returns non-None but `importlib_metadata.version("torchao")` raises `PackageNotFoundError`. PEFT handles this gracefully.
  • HQQ detection is unique: it checks both `quantization_method == "hqq"` and `getattr(model, "hqq_quantized", False)` for backward compatibility.
  • All backends register their own adapter layer subclasses in tuner-specific `bnb.py`, `gptq.py`, `aqlm.py`, `eetq.py`, `hqq.py`, `inc.py`, or `torchao.py` files.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment