Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Optimum GPTQQuantizer Convert Model

From Leeroopedia

Overview

Converts a standard PyTorch model to a GPTQ-ready model by detecting transformer blocks and replacing linear layers with QuantLinear placeholders.

Source

Files: optimum/gptq/quantizer.py and optimum/gptq/utils.py

APIs

GPTQQuantizer.convert_model

File: optimum/gptq/quantizer.py Lines: 253-279

def convert_model(self, model: nn.Module, **kwargs):

Parameters:

Parameter Type Description
model nn.Module The model to convert for GPTQ quantization.
**kwargs Additional keyword arguments. Accepts device_map for selecting the appropriate QuantLinear implementation.

Behavior:

  1. If self.block_name_to_quantize is None, auto-detects the block name using get_block_name_with_pattern(model).
  2. Calls get_layers(model, prefix=block_name) to find all linear layers within the block prefix.
  3. If modules_in_block_to_quantize is specified, filters the layer list to only include matching module names, logging which layers are excluded.
  4. Calls self.select_quant_linear(device_map=..., pack=False) to choose the appropriate QuantLinear class for the configuration.
  5. Calls self._replace_by_quant_layers(model, layers_to_be_replaced) to perform the actual layer replacement.
  6. Returns the modified model.
if self.block_name_to_quantize is None:
    self.block_name_to_quantize = get_block_name_with_pattern(model)
block_name = self.block_name_to_quantize
layers_to_be_replaced = get_layers(model, prefix=block_name)
if self.modules_in_block_to_quantize is not None:
    layers_to_keep = sum(self.modules_in_block_to_quantize, [])
    for name in list(layers_to_be_replaced.keys()):
        if not any(name.endswith(layer) for layer in layers_to_keep):
            del layers_to_be_replaced[name]
self.select_quant_linear(device_map=kwargs.get("device_map", None), pack=False)
self._replace_by_quant_layers(model, layers_to_be_replaced)
return model

get_block_name_with_pattern

File: optimum/gptq/utils.py Lines: 62-77

def get_block_name_with_pattern(model: nn.Module) -> str:

Behavior:

  1. Collects all module names via model.named_modules().
  2. Iterates through the BLOCK_PATTERNS list (defined in optimum/gptq/constants.py).
  3. Returns the first pattern where any module name starts with that pattern.
  4. Raises ValueError if no pattern matches.

Known patterns: "transformer.h", "model.decoder.layers", "gpt_neox.layers", "model.layers", "model.language_model.layers", "h", "decoder.layers", "layers".

get_layers

File: optimum/gptq/utils.py Lines: 33-59

def get_layers(
    module: nn.Module,
    layers=[Conv1D, nn.Conv2d, nn.Linear],
    prefix: Optional[str] = None,
    name: str = "",
) -> Dict[str, Union[Conv1D, nn.Conv2d, nn.Linear]]:

Behavior:

  1. Recursively traverses the module tree.
  2. Collects all modules that are instances of the specified layer types.
  3. If prefix is provided, only includes layers whose fully qualified name starts with that prefix.
  4. Returns a dictionary mapping layer names to layer objects.

Import

from optimum.gptq import GPTQQuantizer
from optimum.gptq.utils import get_block_name_with_pattern, get_layers

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment