Implementation:Huggingface Optimum GPTQQuantizer Init

Overview

Initializes the GPTQQuantizer class with all configuration parameters for GPTQ post-training quantization, validates inputs, and creates the underlying gptqmodel.QuantizeConfig.

Source

File: optimum/gptq/quantizer.py Lines: 69-202

Signature

class GPTQQuantizer(object):
    def __init__(
        self,
        bits: int,
        dataset: Optional[Union[List[str], str]] = None,
        group_size: int = 128,
        damp_percent: float = 0.1,
        desc_act: bool = False,
        act_group_aware: bool = True,
        sym: bool = True,
        true_sequential: bool = True,
        model_seqlen: Optional[int] = None,
        block_name_to_quantize: Optional[str] = None,
        module_name_preceding_first_block: Optional[List[str]] = None,
        batch_size: int = 1,
        pad_token_id: Optional[int] = None,
        max_input_length: Optional[int] = None,
        cache_block_outputs: Optional[bool] = True,
        modules_in_block_to_quantize: Optional[List[List[str]]] = None,
        format: str = "gptq",
        meta: Optional[Dict[str, any]] = None,
        backend: Optional[str] = None,
        *args,
        **kwargs,
    ) -> None:

Import

from optimum.gptq import GPTQQuantizer

Key Parameters

Parameter	Type	Default	Description
`bits`	`int`	(required)	Number of bits to quantize to. Supported: 2, 3, 4, 8.
`dataset`	`Optional[Union[List[str], str]]`	`None`	Calibration dataset. String name (e.g., `"wikitext2"`, `"c4"`, `"c4-new"`), list of strings, or pre-tokenized data.
`group_size`	`int`	`128`	Number of weights sharing quantization parameters. -1 for per-column quantization.
`damp_percent`	`float`	`0.1`	Dampening percent of average Hessian diagonal.
`desc_act`	`bool`	`False`	Quantize columns in order of decreasing activation size (act-order).
`act_group_aware`	`bool`	`True`	Use GAR (group aware activation order). Only applies when `desc_act=False`.
`sym`	`bool`	`True`	Use symmetric quantization.
`true_sequential`	`bool`	`True`	Enable layer-wise sequential quantization within each block.
`format`	`str`	`"gptq"`	Weight format: `"gptq"` (v1) or `"gptq_v2"`.
`backend`	`Optional[str]`	`None`	Inference kernel backend (e.g., `"auto"`, `"auto_trainable"`).

Behavior

The constructor performs the following steps:

Stores all configuration parameters as instance attributes.
Creates a gptqmodel.QuantizeConfig object (self.quantizeConfig) by mapping the parameters to gptqmodel's configuration format:
- Converts format string to FORMAT enum.
- Converts quant_method to METHOD enum.
- Sets offload_to_disk=False.
Defines self.serialization_keys listing which parameters should be included when serializing the config via to_dict().
Validates the configuration:
- bits must be in [2, 3, 4, 8] — raises ValueError otherwise.
- group_size must be greater than 0 or equal to -1 — raises ValueError otherwise.
- damp_percent must be strictly between 0 and 1 — raises ValueError otherwise.

if self.bits not in [2, 3, 4, 8]:
    raise ValueError("only support quantize to [2,3,4,8] bits.")
if self.group_size != -1 and self.group_size <= 0:
    raise ValueError("group_size must be greater than 0 or equal to -1")
if not (0 < self.damp_percent < 1):
    raise ValueError("damp_percent must between 0 and 1.")

External Dependencies

Dependency	Import Path	Usage
`QuantizeConfig`	`gptqmodel.QuantizeConfig`	Underlying quantization configuration object.
`FORMAT`	`gptqmodel.quantization.FORMAT`	Enum for weight format (`gptq`, `gptq_v2`).
`METHOD`	`gptqmodel.quantization.METHOD`	Enum for quantization method.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment