Environment:Huggingface Optimum GPTQ Quantization Environment

Knowledge Sources	Huggingface Optimum GPTQ Quantizer
Domains	Quantization, GPU_Acceleration
Last Updated	2026-02-15 00:00 GMT

Overview

GPU-accelerated (CUDA/XPU) or Intel IPEX CPU environment with `gptqmodel` >= 1.6.0 and `accelerate` for GPTQ weight quantization and loading.

Description

This environment provides the dependencies needed for GPTQ quantization of large language models. It requires the `gptqmodel` package (successor to the deprecated `auto-gptq`) for the actual quantization algorithm, and `accelerate` for device dispatch and checkpoint loading. Hardware-wise, either an NVIDIA GPU (CUDA), Intel GPU (XPU), or Intel CPU with IPEX support is required. The `datasets` package is also needed for loading calibration data (wikitext2, c4, c4-new).

Usage

Use this environment when performing GPTQ model quantization (the `GPTQQuantizer.quantize_model()` workflow) or loading pre-quantized GPTQ models (`load_quantized_model()`). This is the mandatory prerequisite for all GPTQ-related Implementations.

System Requirements

Category	Requirement	Notes
OS	Linux recommended	CUDA/ROCm drivers required for GPU
Hardware	NVIDIA GPU (CUDA) OR Intel GPU (XPU) OR Intel CPU with IPEX	Minimum GPU for quantization; CPU supported only via gptqmodel+IPEX
Disk	Sufficient for model + calibration data	Models can be 2-70GB+ depending on size

Dependencies

Required Packages

`gptqmodel` >= 1.6.0 (mandatory for quantization and loading)
`accelerate` (mandatory for model loading and device dispatch)
`torch` >= 2.1.0 (core dependency)
`transformers` >= 4.36.0 (AutoTokenizer, model configs)
`datasets` (required for calibration data loading: wikitext2, c4, c4-new)
`tqdm` (progress bars during quantization)

Deprecated Packages

`auto-gptq` >= 0.4.99 (deprecated, being replaced by `gptqmodel`)

Credentials

No credentials required for GPTQ quantization itself. Model access may require:

`HF_TOKEN`: HuggingFace API token if quantizing gated models (e.g., Llama).

Quick Install

# Install gptqmodel (replaces deprecated auto-gptq)
pip install gptqmodel

# Install all required packages
pip install optimum gptqmodel accelerate datasets torch>=2.1.0 transformers>=4.36.0

Code Evidence

GPTQModel requirement from `optimum/gptq/quantizer.py:379-382`:

if not is_gptqmodel_available():
    raise RuntimeError(
        "gptqmodel is required in order to perform gptq quantization: "
        "`pip install gptqmodel`. Please notice that auto-gptq will be "
        "deprecated in the future."
    )

GPU/CPU hardware requirement from `optimum/gptq/quantizer.py:384-389`:

gptq_supports_cpu = is_gptqmodel_available()

if not gptq_supports_cpu and not torch.cuda.is_available():
    raise RuntimeError(
        "No cuda gpu or cpu support using Intel/IPEX found. "
        "A gpu or cpu with Intel/IPEX is required for quantization."
    )

Hardware detection function from `optimum/gptq/quantizer.py:60-61`:

def has_device_more_than_cpu():
    return torch.cuda.is_available() or (hasattr(torch, "xpu") and torch.xpu.is_available())

Accelerate requirement for loading from `optimum/gptq/quantizer.py:809-813`:

if not is_accelerate_available():
    raise RuntimeError(
        "You need to install accelerate in order to load and dispatch weights to"
        "a quantized model. You can do it with `pip install accelerate`"
    )

GPTQModel version validation from `optimum/utils/import_utils.py:222-230`:

def is_gptqmodel_available():
    if _gptqmodel_available:
        v = version.parse(importlib.metadata.version("gptqmodel"))
        if v >= GPTQMODEL_MINIMUM_VERSION:
            return True
        else:
            raise ImportError(
                f"Found an incompatible version of gptqmodel. Found version {v}, "
                f"but only version >= {GPTQMODEL_MINIMUM_VERSION} are supported"
            )

Common Errors

Error Message	Cause	Solution
`gptqmodel is required in order to perform gptq quantization`	gptqmodel not installed	`pip install gptqmodel`
`No cuda gpu or cpu support using Intel/IPEX found`	No GPU and no Intel IPEX CPU support	Install CUDA drivers or Intel IPEX
`Asymmetric sym=False quantization is not supported with auto-gptq`	Using deprecated auto-gptq with asymmetric quant	Switch to gptqmodel: `pip install gptqmodel`
`gptq_v2 format only supported with gptqmodel`	GPTQ v2 format requires gptqmodel	`pip install gptqmodel`
`disk offload is not supported with GPTQ quantization`	Model has disk offload in device map	Remove disk offload; use GPU/CPU only
`You need to install accelerate`	accelerate not installed for model loading	`pip install accelerate`
`Found an incompatible version of gptqmodel`	gptqmodel version below 1.6.0	`pip install -U gptqmodel>=1.6.0`

Compatibility Notes

CUDA GPUs: Full support for NVIDIA GPUs via CUDA. Cache is explicitly cleared with `torch.cuda.empty_cache()` during quantization.
Intel XPU (Arc GPUs): Supported via `torch.xpu.is_available()` check. Cache cleared with `torch.xpu.empty_cache()`.
CPU-only: Only supported when using `gptqmodel` (not auto-gptq) with Intel IPEX.
auto-gptq deprecation: `auto-gptq` is deprecated. The codebase enforces `gptqmodel` for new features (asymmetric quantization, GPTQ v2 format).
Device maps: Disk offload is explicitly blocked. CPU offload with multiple devices triggers a warning about potential memory issues.
PTB datasets: The `ptb` and `ptb-new` calibration datasets are deprecated and raise `RuntimeError` if used.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment