Implementation:Huggingface Optimum GPTQQuantizer Convert Model
Appearance
Overview
Converts a standard PyTorch model to a GPTQ-ready model by detecting transformer blocks and replacing linear layers with QuantLinear placeholders.
Source
Files: optimum/gptq/quantizer.py and optimum/gptq/utils.py
APIs
GPTQQuantizer.convert_model
File: optimum/gptq/quantizer.py Lines: 253-279
def convert_model(self, model: nn.Module, **kwargs):
Parameters:
| Parameter | Type | Description |
|---|---|---|
model |
nn.Module |
The model to convert for GPTQ quantization. |
**kwargs |
Additional keyword arguments. Accepts device_map for selecting the appropriate QuantLinear implementation.
|
Behavior:
- If
self.block_name_to_quantizeisNone, auto-detects the block name usingget_block_name_with_pattern(model). - Calls
get_layers(model, prefix=block_name)to find all linear layers within the block prefix. - If
modules_in_block_to_quantizeis specified, filters the layer list to only include matching module names, logging which layers are excluded. - Calls
self.select_quant_linear(device_map=..., pack=False)to choose the appropriateQuantLinearclass for the configuration. - Calls
self._replace_by_quant_layers(model, layers_to_be_replaced)to perform the actual layer replacement. - Returns the modified model.
if self.block_name_to_quantize is None:
self.block_name_to_quantize = get_block_name_with_pattern(model)
block_name = self.block_name_to_quantize
layers_to_be_replaced = get_layers(model, prefix=block_name)
if self.modules_in_block_to_quantize is not None:
layers_to_keep = sum(self.modules_in_block_to_quantize, [])
for name in list(layers_to_be_replaced.keys()):
if not any(name.endswith(layer) for layer in layers_to_keep):
del layers_to_be_replaced[name]
self.select_quant_linear(device_map=kwargs.get("device_map", None), pack=False)
self._replace_by_quant_layers(model, layers_to_be_replaced)
return model
get_block_name_with_pattern
File: optimum/gptq/utils.py Lines: 62-77
def get_block_name_with_pattern(model: nn.Module) -> str:
Behavior:
- Collects all module names via
model.named_modules(). - Iterates through the
BLOCK_PATTERNSlist (defined inoptimum/gptq/constants.py). - Returns the first pattern where any module name starts with that pattern.
- Raises
ValueErrorif no pattern matches.
Known patterns: "transformer.h", "model.decoder.layers", "gpt_neox.layers", "model.layers", "model.language_model.layers", "h", "decoder.layers", "layers".
get_layers
File: optimum/gptq/utils.py Lines: 33-59
def get_layers(
module: nn.Module,
layers=[Conv1D, nn.Conv2d, nn.Linear],
prefix: Optional[str] = None,
name: str = "",
) -> Dict[str, Union[Conv1D, nn.Conv2d, nn.Linear]]:
Behavior:
- Recursively traverses the module tree.
- Collects all modules that are instances of the specified layer types.
- If
prefixis provided, only includes layers whose fully qualified name starts with that prefix. - Returns a dictionary mapping layer names to layer objects.
Import
from optimum.gptq import GPTQQuantizer
from optimum.gptq.utils import get_block_name_with_pattern, get_layers
Related
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment