Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Optimum GPTQQuantizer Init

From Leeroopedia

Overview

Initializes the GPTQQuantizer class with all configuration parameters for GPTQ post-training quantization, validates inputs, and creates the underlying gptqmodel.QuantizeConfig.

Source

File: optimum/gptq/quantizer.py Lines: 69-202

Signature

class GPTQQuantizer(object):
    def __init__(
        self,
        bits: int,
        dataset: Optional[Union[List[str], str]] = None,
        group_size: int = 128,
        damp_percent: float = 0.1,
        desc_act: bool = False,
        act_group_aware: bool = True,
        sym: bool = True,
        true_sequential: bool = True,
        model_seqlen: Optional[int] = None,
        block_name_to_quantize: Optional[str] = None,
        module_name_preceding_first_block: Optional[List[str]] = None,
        batch_size: int = 1,
        pad_token_id: Optional[int] = None,
        max_input_length: Optional[int] = None,
        cache_block_outputs: Optional[bool] = True,
        modules_in_block_to_quantize: Optional[List[List[str]]] = None,
        format: str = "gptq",
        meta: Optional[Dict[str, any]] = None,
        backend: Optional[str] = None,
        *args,
        **kwargs,
    ) -> None:

Import

from optimum.gptq import GPTQQuantizer

Key Parameters

Parameter Type Default Description
bits int (required) Number of bits to quantize to. Supported: 2, 3, 4, 8.
dataset Optional[Union[List[str], str]] None Calibration dataset. String name (e.g., "wikitext2", "c4", "c4-new"), list of strings, or pre-tokenized data.
group_size int 128 Number of weights sharing quantization parameters. -1 for per-column quantization.
damp_percent float 0.1 Dampening percent of average Hessian diagonal.
desc_act bool False Quantize columns in order of decreasing activation size (act-order).
act_group_aware bool True Use GAR (group aware activation order). Only applies when desc_act=False.
sym bool True Use symmetric quantization.
true_sequential bool True Enable layer-wise sequential quantization within each block.
format str "gptq" Weight format: "gptq" (v1) or "gptq_v2".
backend Optional[str] None Inference kernel backend (e.g., "auto", "auto_trainable").

Behavior

The constructor performs the following steps:

  1. Stores all configuration parameters as instance attributes.
  2. Creates a gptqmodel.QuantizeConfig object (self.quantizeConfig) by mapping the parameters to gptqmodel's configuration format:
    • Converts format string to FORMAT enum.
    • Converts quant_method to METHOD enum.
    • Sets offload_to_disk=False.
  3. Defines self.serialization_keys listing which parameters should be included when serializing the config via to_dict().
  4. Validates the configuration:
    • bits must be in [2, 3, 4, 8] — raises ValueError otherwise.
    • group_size must be greater than 0 or equal to -1 — raises ValueError otherwise.
    • damp_percent must be strictly between 0 and 1 — raises ValueError otherwise.
if self.bits not in [2, 3, 4, 8]:
    raise ValueError("only support quantize to [2,3,4,8] bits.")
if self.group_size != -1 and self.group_size <= 0:
    raise ValueError("group_size must be greater than 0 or equal to -1")
if not (0 < self.damp_percent < 1):
    raise ValueError("damp_percent must between 0 and 1.")

External Dependencies

Dependency Import Path Usage
QuantizeConfig gptqmodel.QuantizeConfig Underlying quantization configuration object.
FORMAT gptqmodel.quantization.FORMAT Enum for weight format (gptq, gptq_v2).
METHOD gptqmodel.quantization.METHOD Enum for quantization method.

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment