Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Transformers Quantization Backend Selection Pattern

From Leeroopedia
Knowledge Sources
Domains Model_Optimization, Quantization, Backend_Selection
Last Updated 2026-02-13 00:00 GMT

Overview

Concrete pattern for selecting and dispatching quantization backends provided by Hugging Face Transformers.

Description

This pattern encompasses the QuantizationMethod enum, the AUTO_QUANTIZER_MAPPING registry, and the AutoHfQuantizer dispatcher class. Together they implement the strategy pattern: the user creates a configuration object (a subclass of QuantizationConfigMixin), and the AutoHfQuantizer.from_config class method resolves it to the appropriate HfQuantizer subclass. The AUTO_QUANTIZATION_CONFIG_MAPPING dictionary provides the reverse mapping from string method names to config classes, enabling deserialization of saved quantization configurations.

The registry currently supports 18+ quantization backends. New backends can be added via register_quantizer() and register_quantization_config() decorators without modifying existing code.

Usage

Use this pattern when you need to:

  • Instantiate a quantizer from a configuration object or dictionary.
  • Load a pre-quantized model and resolve the correct quantizer backend automatically.
  • Extend the framework with a custom quantization backend.

Code Reference

Source Location

  • Repository: transformers
  • File: src/transformers/utils/quantization_config.py (QuantizationMethod enum, config classes)
  • File: src/transformers/quantizers/auto.py (AutoHfQuantizer, AUTO_QUANTIZER_MAPPING)

Signature

# QuantizationMethod enum (quantization_config.py:L45-64)
class QuantizationMethod(str, Enum):
    BITS_AND_BYTES = "bitsandbytes"
    GPTQ = "gptq"
    AWQ = "awq"
    AQLM = "aqlm"
    VPTQ = "vptq"
    QUANTO = "quanto"
    EETQ = "eetq"
    HIGGS = "higgs"
    HQQ = "hqq"
    COMPRESSED_TENSORS = "compressed-tensors"
    FBGEMM_FP8 = "fbgemm_fp8"
    TORCHAO = "torchao"
    BITNET = "bitnet"
    SPQR = "spqr"
    FP8 = "fp8"
    QUARK = "quark"
    FPQUANT = "fp_quant"
    AUTOROUND = "auto-round"
    MXFP4 = "mxfp4"

# AutoHfQuantizer dispatcher (auto.py:L155-184)
class AutoHfQuantizer:
    @classmethod
    def from_config(
        cls,
        quantization_config: QuantizationConfigMixin | dict,
        **kwargs
    ) -> HfQuantizer: ...

    @classmethod
    def from_pretrained(
        cls,
        pretrained_model_name_or_path: str,
        **kwargs
    ) -> HfQuantizer: ...

Import

from transformers.utils.quantization_config import QuantizationMethod
from transformers.quantizers.auto import AutoHfQuantizer, AutoQuantizationConfig
from transformers import BitsAndBytesConfig, GPTQConfig, AwqConfig

I/O Contract

Inputs

Name Type Required Description
quantization_config QuantizationConfigMixin or dict Yes A quantization configuration object or dictionary specifying the method and parameters.
pre_quantized bool No Whether the model is already quantized. Passed to the quantizer constructor.
**kwargs dict No Additional keyword arguments forwarded to the quantizer constructor.

Outputs

Name Type Description
quantizer HfQuantizer An instantiated quantizer object for the selected backend (e.g., Bnb4BitHfQuantizer, GptqHfQuantizer).

Usage Examples

Basic Usage: Selecting BitsAndBytes 4-bit

from transformers import BitsAndBytesConfig
from transformers.quantizers.auto import AutoHfQuantizer

# Create a BitsAndBytes 4-bit configuration
bnb_config = BitsAndBytesConfig(load_in_4bit=True)

# The dispatcher resolves it to Bnb4BitHfQuantizer
quantizer = AutoHfQuantizer.from_config(bnb_config)
print(type(quantizer))  # <class 'transformers.quantizers.quantizer_bnb_4bit.Bnb4BitHfQuantizer'>

Selecting GPTQ Backend

from transformers import GPTQConfig
from transformers.quantizers.auto import AutoHfQuantizer

gptq_config = GPTQConfig(bits=4, dataset="c4")
quantizer = AutoHfQuantizer.from_config(gptq_config)
print(type(quantizer))  # <class 'transformers.quantizers.quantizer_gptq.GptqHfQuantizer'>

Loading Pre-quantized Model Config

from transformers.quantizers.auto import AutoQuantizationConfig

# Resolve quantization config from a pre-quantized model on the Hub
quant_config = AutoQuantizationConfig.from_pretrained("TheBloke/Llama-2-7B-GPTQ")
print(quant_config.quant_method)  # "gptq"

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment