Implementation:Huggingface Optimum GPTQQuantizer Init
Appearance
Overview
Initializes the GPTQQuantizer class with all configuration parameters for GPTQ post-training quantization, validates inputs, and creates the underlying gptqmodel.QuantizeConfig.
Source
File: optimum/gptq/quantizer.py Lines: 69-202
Signature
class GPTQQuantizer(object):
def __init__(
self,
bits: int,
dataset: Optional[Union[List[str], str]] = None,
group_size: int = 128,
damp_percent: float = 0.1,
desc_act: bool = False,
act_group_aware: bool = True,
sym: bool = True,
true_sequential: bool = True,
model_seqlen: Optional[int] = None,
block_name_to_quantize: Optional[str] = None,
module_name_preceding_first_block: Optional[List[str]] = None,
batch_size: int = 1,
pad_token_id: Optional[int] = None,
max_input_length: Optional[int] = None,
cache_block_outputs: Optional[bool] = True,
modules_in_block_to_quantize: Optional[List[List[str]]] = None,
format: str = "gptq",
meta: Optional[Dict[str, any]] = None,
backend: Optional[str] = None,
*args,
**kwargs,
) -> None:
Import
from optimum.gptq import GPTQQuantizer
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
bits |
int |
(required) | Number of bits to quantize to. Supported: 2, 3, 4, 8. |
dataset |
Optional[Union[List[str], str]] |
None |
Calibration dataset. String name (e.g., "wikitext2", "c4", "c4-new"), list of strings, or pre-tokenized data.
|
group_size |
int |
128 |
Number of weights sharing quantization parameters. -1 for per-column quantization. |
damp_percent |
float |
0.1 |
Dampening percent of average Hessian diagonal. |
desc_act |
bool |
False |
Quantize columns in order of decreasing activation size (act-order). |
act_group_aware |
bool |
True |
Use GAR (group aware activation order). Only applies when desc_act=False.
|
sym |
bool |
True |
Use symmetric quantization. |
true_sequential |
bool |
True |
Enable layer-wise sequential quantization within each block. |
format |
str |
"gptq" |
Weight format: "gptq" (v1) or "gptq_v2".
|
backend |
Optional[str] |
None |
Inference kernel backend (e.g., "auto", "auto_trainable").
|
Behavior
The constructor performs the following steps:
- Stores all configuration parameters as instance attributes.
- Creates a
gptqmodel.QuantizeConfigobject (self.quantizeConfig) by mapping the parameters to gptqmodel's configuration format:- Converts
formatstring toFORMATenum. - Converts
quant_methodtoMETHODenum. - Sets
offload_to_disk=False.
- Converts
- Defines
self.serialization_keyslisting which parameters should be included when serializing the config viato_dict(). - Validates the configuration:
bitsmust be in[2, 3, 4, 8]— raisesValueErrorotherwise.group_sizemust be greater than 0 or equal to -1 — raisesValueErrorotherwise.damp_percentmust be strictly between 0 and 1 — raisesValueErrorotherwise.
if self.bits not in [2, 3, 4, 8]:
raise ValueError("only support quantize to [2,3,4,8] bits.")
if self.group_size != -1 and self.group_size <= 0:
raise ValueError("group_size must be greater than 0 or equal to -1")
if not (0 < self.damp_percent < 1):
raise ValueError("damp_percent must between 0 and 1.")
External Dependencies
| Dependency | Import Path | Usage |
|---|---|---|
QuantizeConfig |
gptqmodel.QuantizeConfig |
Underlying quantization configuration object. |
FORMAT |
gptqmodel.quantization.FORMAT |
Enum for weight format (gptq, gptq_v2).
|
METHOD |
gptqmodel.quantization.METHOD |
Enum for quantization method. |
Related
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment