Implementation:Huggingface Transformers Quantization Backend Selection Pattern
| Knowledge Sources | |
|---|---|
| Domains | Model_Optimization, Quantization, Backend_Selection |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
Concrete pattern for selecting and dispatching quantization backends provided by Hugging Face Transformers.
Description
This pattern encompasses the QuantizationMethod enum, the AUTO_QUANTIZER_MAPPING registry, and the AutoHfQuantizer dispatcher class. Together they implement the strategy pattern: the user creates a configuration object (a subclass of QuantizationConfigMixin), and the AutoHfQuantizer.from_config class method resolves it to the appropriate HfQuantizer subclass. The AUTO_QUANTIZATION_CONFIG_MAPPING dictionary provides the reverse mapping from string method names to config classes, enabling deserialization of saved quantization configurations.
The registry currently supports 18+ quantization backends. New backends can be added via register_quantizer() and register_quantization_config() decorators without modifying existing code.
Usage
Use this pattern when you need to:
- Instantiate a quantizer from a configuration object or dictionary.
- Load a pre-quantized model and resolve the correct quantizer backend automatically.
- Extend the framework with a custom quantization backend.
Code Reference
Source Location
- Repository: transformers
- File:
src/transformers/utils/quantization_config.py(QuantizationMethod enum, config classes) - File:
src/transformers/quantizers/auto.py(AutoHfQuantizer, AUTO_QUANTIZER_MAPPING)
Signature
# QuantizationMethod enum (quantization_config.py:L45-64)
class QuantizationMethod(str, Enum):
BITS_AND_BYTES = "bitsandbytes"
GPTQ = "gptq"
AWQ = "awq"
AQLM = "aqlm"
VPTQ = "vptq"
QUANTO = "quanto"
EETQ = "eetq"
HIGGS = "higgs"
HQQ = "hqq"
COMPRESSED_TENSORS = "compressed-tensors"
FBGEMM_FP8 = "fbgemm_fp8"
TORCHAO = "torchao"
BITNET = "bitnet"
SPQR = "spqr"
FP8 = "fp8"
QUARK = "quark"
FPQUANT = "fp_quant"
AUTOROUND = "auto-round"
MXFP4 = "mxfp4"
# AutoHfQuantizer dispatcher (auto.py:L155-184)
class AutoHfQuantizer:
@classmethod
def from_config(
cls,
quantization_config: QuantizationConfigMixin | dict,
**kwargs
) -> HfQuantizer: ...
@classmethod
def from_pretrained(
cls,
pretrained_model_name_or_path: str,
**kwargs
) -> HfQuantizer: ...
Import
from transformers.utils.quantization_config import QuantizationMethod
from transformers.quantizers.auto import AutoHfQuantizer, AutoQuantizationConfig
from transformers import BitsAndBytesConfig, GPTQConfig, AwqConfig
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| quantization_config | QuantizationConfigMixin or dict |
Yes | A quantization configuration object or dictionary specifying the method and parameters. |
| pre_quantized | bool |
No | Whether the model is already quantized. Passed to the quantizer constructor. |
| **kwargs | dict |
No | Additional keyword arguments forwarded to the quantizer constructor. |
Outputs
| Name | Type | Description |
|---|---|---|
| quantizer | HfQuantizer |
An instantiated quantizer object for the selected backend (e.g., Bnb4BitHfQuantizer, GptqHfQuantizer).
|
Usage Examples
Basic Usage: Selecting BitsAndBytes 4-bit
from transformers import BitsAndBytesConfig
from transformers.quantizers.auto import AutoHfQuantizer
# Create a BitsAndBytes 4-bit configuration
bnb_config = BitsAndBytesConfig(load_in_4bit=True)
# The dispatcher resolves it to Bnb4BitHfQuantizer
quantizer = AutoHfQuantizer.from_config(bnb_config)
print(type(quantizer)) # <class 'transformers.quantizers.quantizer_bnb_4bit.Bnb4BitHfQuantizer'>
Selecting GPTQ Backend
from transformers import GPTQConfig
from transformers.quantizers.auto import AutoHfQuantizer
gptq_config = GPTQConfig(bits=4, dataset="c4")
quantizer = AutoHfQuantizer.from_config(gptq_config)
print(type(quantizer)) # <class 'transformers.quantizers.quantizer_gptq.GptqHfQuantizer'>
Loading Pre-quantized Model Config
from transformers.quantizers.auto import AutoQuantizationConfig
# Resolve quantization config from a pre-quantized model on the Hub
quant_config = AutoQuantizationConfig.from_pretrained("TheBloke/Llama-2-7B-GPTQ")
print(quant_config.quant_method) # "gptq"