Implementation:Huggingface Diffusers PipelineQuantizationConfig

Metadata

Property	Value
API	`PipelineQuantizationConfig(quant_mapping, quant_backend=None, quant_kwargs=None, components_to_quantize=None)`
Module	`src/diffusers/quantizers/pipe_quant_config.py`
Lines	L34-L207
Import	`from diffusers.quantizers import PipelineQuantizationConfig`
Type	API Doc
Principle	Huggingface_Diffusers_Pipeline_Level_Quantization
Implements	Principle:Huggingface_Diffusers_Pipeline_Level_Quantization

Purpose

PipelineQuantizationConfig is the configuration class for applying quantization at the pipeline level during DiffusionPipeline.from_pretrained. It supports two modes: a global mode (same backend for all/selected components) and a granular mode (per-component config mapping). It resolves the appropriate quantization config for each component at loading time.

I/O Contract

Constructor Parameters

Parameter	Type	Default	Description
`quant_backend`	None	`None`	Quantization backend name (e.g., `"bitsandbytes_4bit"`, `"torchao"`). Used for global mode.
`quant_kwargs`	None	`None`	Keyword arguments to initialize the backend's config class. Defaults to `{}`.
`components_to_quantize`	str \| None	`None`	Component names to quantize in global mode. If `None`, all nn.Module components are quantized.
`quant_mapping`	None	`None`	Per-component config mapping for granular mode.

Validation Rules

Rule	Error
Both `quant_backend` and `quant_mapping` provided	`ValueError`
Neither `quant_backend` nor `quant_mapping` provided	`ValueError`
Neither `quant_kwargs` nor `quant_mapping` provided	`ValueError`
`quant_backend` not found in diffusers or transformers backends	`ValueError`
Config classes in `quant_mapping` not found in any backend	`ValueError`
Init signatures mismatch between diffusers and transformers config classes (global mode)	`ValueError`

Constructor and Initialization

class PipelineQuantizationConfig:
    def __init__(
        self,
        quant_backend: str = None,
        quant_kwargs: dict[str, str | float | int | dict] = None,
        components_to_quantize: list[str] | str | None = None,
        quant_mapping: dict[str, DiffQuantConfigMixin | TransformersQuantConfigMixin] = None,
    ):
        self.quant_backend = quant_backend
        self.quant_kwargs = quant_kwargs or {}
        if components_to_quantize:
            if isinstance(components_to_quantize, str):
                components_to_quantize = [components_to_quantize]
        self.components_to_quantize = components_to_quantize
        self.quant_mapping = quant_mapping
        self.config_mapping = {}  # book-keeping: {module_name: quant_config}
        self.post_init()

    def post_init(self):
        self.is_granular = True if self.quant_mapping is not None else False
        self._validate_init_args()

Key Methods

_resolve_quant_config

Called by load_sub_model() for each pipeline component. Returns the appropriate quantization config or None if the component should not be quantized.

def _resolve_quant_config(self, is_diffusers: bool = True, module_name: str = None):
    quant_config_mapping_transformers, quant_config_mapping_diffusers = self._get_quant_config_list()

    # Granular case: look up component in quant_mapping
    if self.is_granular and module_name in self.quant_mapping:
        config = self.quant_mapping[module_name]
        self.config_mapping.update({module_name: config})
        return config

    # Global config case
    else:
        should_quantize = False
        if self.components_to_quantize and module_name in self.components_to_quantize:
            should_quantize = True
        elif not self.is_granular and not self.components_to_quantize:
            should_quantize = True  # quantize all components

        if should_quantize:
            mapping_to_use = quant_config_mapping_diffusers if is_diffusers else quant_config_mapping_transformers
            quant_config_cls = mapping_to_use[self.quant_backend]
            quant_obj = quant_config_cls(**self.quant_kwargs)
            self.config_mapping.update({module_name: quant_obj})
            return quant_obj

    return None  # no quantization for this component

Control flow:

Granular mode: If quant_mapping contains the component name, return its config directly.
Global mode with filter: If components_to_quantize is set and the component is in the list, create a new config instance from quant_backend + quant_kwargs.
Global mode without filter: If no filter is set, quantize all components.
Fallback: Return None -- component is loaded without quantization.

_get_quant_config_list

Returns the backend config mappings from both diffusers and transformers:

def _get_quant_config_list(self):
    if is_transformers_available():
        from transformers.quantizers.auto import AUTO_QUANTIZATION_CONFIG_MAPPING as quant_config_mapping_transformers
    else:
        quant_config_mapping_transformers = None

    from ..quantizers.auto import AUTO_QUANTIZATION_CONFIG_MAPPING as quant_config_mapping_diffusers

    return quant_config_mapping_transformers, quant_config_mapping_diffusers

Integration Point: load_sub_model

In pipeline_loading_utils.py:L869-L878, the pipeline quantization config is resolved per-component:

# Inside load_sub_model():
if (
    quantization_config is not None
    and isinstance(quantization_config, PipelineQuantizationConfig)
    and issubclass(class_obj, torch.nn.Module)
):
    model_quant_config = quantization_config._resolve_quant_config(
        is_diffusers=is_diffusers_model, module_name=name
    )
    if model_quant_config is not None:
        loading_kwargs["quantization_config"] = model_quant_config

The resolved config is passed as quantization_config to the component's from_pretrained method, which then follows the standard model-level quantization loading flow.

Usage Examples

Global Mode: Same Backend for All Components

from diffusers import DiffusionPipeline
from diffusers.quantizers import PipelineQuantizationConfig

pipeline_quant_config = PipelineQuantizationConfig(
    quant_backend="bitsandbytes_4bit",
    quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4"},
)

pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/Flux.1-Dev",
    quantization_config=pipeline_quant_config,
    torch_dtype=torch.bfloat16,
)

Global Mode: Selective Components

from diffusers.quantizers import PipelineQuantizationConfig

pipeline_quant_config = PipelineQuantizationConfig(
    quant_backend="torchao",
    quant_kwargs={"quant_type": "int8wo"},
    components_to_quantize=["transformer", "text_encoder"],  # skip VAE
)

pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/Flux.1-Dev",
    quantization_config=pipeline_quant_config,
    torch_dtype=torch.bfloat16,
)

Granular Mode: Per-Component Configs

from diffusers import BitsAndBytesConfig, TorchAoConfig
from diffusers.quantizers import PipelineQuantizationConfig
import torch

pipeline_quant_config = PipelineQuantizationConfig(
    quant_mapping={
        "transformer": BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16,
        ),
        "text_encoder": TorchAoConfig("int8wo"),
        # text_encoder_2 and vae will not be quantized
    }
)

pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/Flux.1-Dev",
    quantization_config=pipeline_quant_config,
    torch_dtype=torch.bfloat16,
)

Implementation Notes

Signature validation in global mode: When using quant_backend, the system compares the __init__ signatures of the diffusers and transformers config classes for that backend. If they differ, it raises an error directing the user to use quant_mapping instead.
Config objects vs. new instances: In granular mode, the user provides pre-constructed config objects. In global mode, new config instances are created on each _resolve_quant_config call using quant_config_cls(**quant_kwargs).
Only nn.Module components: The issubclass(class_obj, torch.nn.Module) check in load_sub_model ensures that only actual model components (not schedulers, tokenizers, etc.) are considered for quantization.
config_mapping for inspection: After pipeline loading, pipeline_quant_config.config_mapping contains the resolved config for each quantized component, enabling post-hoc inspection.

Related Pages

Huggingface_Diffusers_Pipeline_Level_Quantization - Principle of heterogeneous per-component quantization
Huggingface_Diffusers_Quantization_Config_Classes - The config objects used in quant_mapping
Huggingface_Diffusers_ModelMixin_From_Pretrained_Quantized - How individual components consume the resolved config
Huggingface_Diffusers_Quantized_Pipeline_Call - Running inference with the quantized pipeline

Requires Environment

Environment:Huggingface_Diffusers_Quantization_Environment

Source References

src/diffusers/quantizers/pipe_quant_config.py:L34-L64 - Constructor and post_init
src/diffusers/quantizers/pipe_quant_config.py:L66-L86 - Validation logic
src/diffusers/quantizers/pipe_quant_config.py:L154-L187 - _resolve_quant_config method
src/diffusers/quantizers/pipe_quant_config.py:L189-L199 - _get_quant_config_list method
src/diffusers/pipelines/pipeline_loading_utils.py:L869-L878 - Integration in load_sub_model

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment