Implementation:Huggingface Optimum GPTQQuantizer Convert Model

Overview

Converts a standard PyTorch model to a GPTQ-ready model by detecting transformer blocks and replacing linear layers with QuantLinear placeholders.

Source

Files: optimum/gptq/quantizer.py and optimum/gptq/utils.py

APIs

GPTQQuantizer.convert_model

File: optimum/gptq/quantizer.py Lines: 253-279

def convert_model(self, model: nn.Module, **kwargs):

Parameters:

Parameter	Type	Description
`model`	`nn.Module`	The model to convert for GPTQ quantization.
`**kwargs`		Additional keyword arguments. Accepts `device_map` for selecting the appropriate `QuantLinear` implementation.

Behavior:

If self.block_name_to_quantize is None, auto-detects the block name using get_block_name_with_pattern(model).
Calls get_layers(model, prefix=block_name) to find all linear layers within the block prefix.
If modules_in_block_to_quantize is specified, filters the layer list to only include matching module names, logging which layers are excluded.
Calls self.select_quant_linear(device_map=..., pack=False) to choose the appropriate QuantLinear class for the configuration.
Calls self._replace_by_quant_layers(model, layers_to_be_replaced) to perform the actual layer replacement.
Returns the modified model.

if self.block_name_to_quantize is None:
    self.block_name_to_quantize = get_block_name_with_pattern(model)
block_name = self.block_name_to_quantize
layers_to_be_replaced = get_layers(model, prefix=block_name)
if self.modules_in_block_to_quantize is not None:
    layers_to_keep = sum(self.modules_in_block_to_quantize, [])
    for name in list(layers_to_be_replaced.keys()):
        if not any(name.endswith(layer) for layer in layers_to_keep):
            del layers_to_be_replaced[name]
self.select_quant_linear(device_map=kwargs.get("device_map", None), pack=False)
self._replace_by_quant_layers(model, layers_to_be_replaced)
return model

get_block_name_with_pattern

File: optimum/gptq/utils.py Lines: 62-77

def get_block_name_with_pattern(model: nn.Module) -> str:

Behavior:

Collects all module names via model.named_modules().
Iterates through the BLOCK_PATTERNS list (defined in optimum/gptq/constants.py).
Returns the first pattern where any module name starts with that pattern.
Raises ValueError if no pattern matches.

Known patterns: "transformer.h", "model.decoder.layers", "gpt_neox.layers", "model.layers", "model.language_model.layers", "h", "decoder.layers", "layers".

get_layers

File: optimum/gptq/utils.py Lines: 33-59

def get_layers(
    module: nn.Module,
    layers=[Conv1D, nn.Conv2d, nn.Linear],
    prefix: Optional[str] = None,
    name: str = "",
) -> Dict[str, Union[Conv1D, nn.Conv2d, nn.Linear]]:

Behavior:

Recursively traverses the module tree.
Collects all modules that are instances of the specified layer types.
If prefix is provided, only includes layers whose fully qualified name starts with that prefix.
Returns a dictionary mapping layer names to layer objects.

Import

from optimum.gptq import GPTQQuantizer
from optimum.gptq.utils import get_block_name_with_pattern, get_layers

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Overview

Source

APIs

GPTQQuantizer.convert_model

get_block_name_with_pattern

get_layers

Import

Related

Page Connections