Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Diffusers PipelineQuantizationConfig

From Leeroopedia

Metadata

Property Value
API PipelineQuantizationConfig(quant_mapping, quant_backend=None, quant_kwargs=None, components_to_quantize=None)
Module src/diffusers/quantizers/pipe_quant_config.py
Lines L34-L207
Import from diffusers.quantizers import PipelineQuantizationConfig
Type API Doc
Principle Huggingface_Diffusers_Pipeline_Level_Quantization
Implements Principle:Huggingface_Diffusers_Pipeline_Level_Quantization

Purpose

PipelineQuantizationConfig is the configuration class for applying quantization at the pipeline level during DiffusionPipeline.from_pretrained. It supports two modes: a global mode (same backend for all/selected components) and a granular mode (per-component config mapping). It resolves the appropriate quantization config for each component at loading time.

I/O Contract

Constructor Parameters

Parameter Type Default Description
quant_backend None None Quantization backend name (e.g., "bitsandbytes_4bit", "torchao"). Used for global mode.
quant_kwargs None None Keyword arguments to initialize the backend's config class. Defaults to {}.
components_to_quantize str | None None Component names to quantize in global mode. If None, all nn.Module components are quantized.
quant_mapping None None Per-component config mapping for granular mode.

Validation Rules

Rule Error
Both quant_backend and quant_mapping provided ValueError
Neither quant_backend nor quant_mapping provided ValueError
Neither quant_kwargs nor quant_mapping provided ValueError
quant_backend not found in diffusers or transformers backends ValueError
Config classes in quant_mapping not found in any backend ValueError
Init signatures mismatch between diffusers and transformers config classes (global mode) ValueError

Constructor and Initialization

class PipelineQuantizationConfig:
    def __init__(
        self,
        quant_backend: str = None,
        quant_kwargs: dict[str, str | float | int | dict] = None,
        components_to_quantize: list[str] | str | None = None,
        quant_mapping: dict[str, DiffQuantConfigMixin | TransformersQuantConfigMixin] = None,
    ):
        self.quant_backend = quant_backend
        self.quant_kwargs = quant_kwargs or {}
        if components_to_quantize:
            if isinstance(components_to_quantize, str):
                components_to_quantize = [components_to_quantize]
        self.components_to_quantize = components_to_quantize
        self.quant_mapping = quant_mapping
        self.config_mapping = {}  # book-keeping: {module_name: quant_config}
        self.post_init()

    def post_init(self):
        self.is_granular = True if self.quant_mapping is not None else False
        self._validate_init_args()

Key Methods

_resolve_quant_config

Called by load_sub_model() for each pipeline component. Returns the appropriate quantization config or None if the component should not be quantized.

def _resolve_quant_config(self, is_diffusers: bool = True, module_name: str = None):
    quant_config_mapping_transformers, quant_config_mapping_diffusers = self._get_quant_config_list()

    # Granular case: look up component in quant_mapping
    if self.is_granular and module_name in self.quant_mapping:
        config = self.quant_mapping[module_name]
        self.config_mapping.update({module_name: config})
        return config

    # Global config case
    else:
        should_quantize = False
        if self.components_to_quantize and module_name in self.components_to_quantize:
            should_quantize = True
        elif not self.is_granular and not self.components_to_quantize:
            should_quantize = True  # quantize all components

        if should_quantize:
            mapping_to_use = quant_config_mapping_diffusers if is_diffusers else quant_config_mapping_transformers
            quant_config_cls = mapping_to_use[self.quant_backend]
            quant_obj = quant_config_cls(**self.quant_kwargs)
            self.config_mapping.update({module_name: quant_obj})
            return quant_obj

    return None  # no quantization for this component

Control flow:

  1. Granular mode: If quant_mapping contains the component name, return its config directly.
  2. Global mode with filter: If components_to_quantize is set and the component is in the list, create a new config instance from quant_backend + quant_kwargs.
  3. Global mode without filter: If no filter is set, quantize all components.
  4. Fallback: Return None -- component is loaded without quantization.

_get_quant_config_list

Returns the backend config mappings from both diffusers and transformers:

def _get_quant_config_list(self):
    if is_transformers_available():
        from transformers.quantizers.auto import AUTO_QUANTIZATION_CONFIG_MAPPING as quant_config_mapping_transformers
    else:
        quant_config_mapping_transformers = None

    from ..quantizers.auto import AUTO_QUANTIZATION_CONFIG_MAPPING as quant_config_mapping_diffusers

    return quant_config_mapping_transformers, quant_config_mapping_diffusers

Integration Point: load_sub_model

In pipeline_loading_utils.py:L869-L878, the pipeline quantization config is resolved per-component:

# Inside load_sub_model():
if (
    quantization_config is not None
    and isinstance(quantization_config, PipelineQuantizationConfig)
    and issubclass(class_obj, torch.nn.Module)
):
    model_quant_config = quantization_config._resolve_quant_config(
        is_diffusers=is_diffusers_model, module_name=name
    )
    if model_quant_config is not None:
        loading_kwargs["quantization_config"] = model_quant_config

The resolved config is passed as quantization_config to the component's from_pretrained method, which then follows the standard model-level quantization loading flow.

Usage Examples

Global Mode: Same Backend for All Components

from diffusers import DiffusionPipeline
from diffusers.quantizers import PipelineQuantizationConfig

pipeline_quant_config = PipelineQuantizationConfig(
    quant_backend="bitsandbytes_4bit",
    quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4"},
)

pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/Flux.1-Dev",
    quantization_config=pipeline_quant_config,
    torch_dtype=torch.bfloat16,
)

Global Mode: Selective Components

from diffusers.quantizers import PipelineQuantizationConfig

pipeline_quant_config = PipelineQuantizationConfig(
    quant_backend="torchao",
    quant_kwargs={"quant_type": "int8wo"},
    components_to_quantize=["transformer", "text_encoder"],  # skip VAE
)

pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/Flux.1-Dev",
    quantization_config=pipeline_quant_config,
    torch_dtype=torch.bfloat16,
)

Granular Mode: Per-Component Configs

from diffusers import BitsAndBytesConfig, TorchAoConfig
from diffusers.quantizers import PipelineQuantizationConfig
import torch

pipeline_quant_config = PipelineQuantizationConfig(
    quant_mapping={
        "transformer": BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.bfloat16,
        ),
        "text_encoder": TorchAoConfig("int8wo"),
        # text_encoder_2 and vae will not be quantized
    }
)

pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/Flux.1-Dev",
    quantization_config=pipeline_quant_config,
    torch_dtype=torch.bfloat16,
)

Implementation Notes

  • Signature validation in global mode: When using quant_backend, the system compares the __init__ signatures of the diffusers and transformers config classes for that backend. If they differ, it raises an error directing the user to use quant_mapping instead.
  • Config objects vs. new instances: In granular mode, the user provides pre-constructed config objects. In global mode, new config instances are created on each _resolve_quant_config call using quant_config_cls(**quant_kwargs).
  • Only nn.Module components: The issubclass(class_obj, torch.nn.Module) check in load_sub_model ensures that only actual model components (not schedulers, tokenizers, etc.) are considered for quantization.
  • config_mapping for inspection: After pipeline loading, pipeline_quant_config.config_mapping contains the resolved config for each quantized component, enabling post-hoc inspection.

Related Pages

Requires Environment

Source References

  • src/diffusers/quantizers/pipe_quant_config.py:L34-L64 - Constructor and post_init
  • src/diffusers/quantizers/pipe_quant_config.py:L66-L86 - Validation logic
  • src/diffusers/quantizers/pipe_quant_config.py:L154-L187 - _resolve_quant_config method
  • src/diffusers/quantizers/pipe_quant_config.py:L189-L199 - _get_quant_config_list method
  • src/diffusers/pipelines/pipeline_loading_utils.py:L869-L878 - Integration in load_sub_model

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment