Implementation:Huggingface Diffusers PipelineQuantizationConfig
Metadata
| Property | Value |
|---|---|
| API | PipelineQuantizationConfig(quant_mapping, quant_backend=None, quant_kwargs=None, components_to_quantize=None)
|
| Module | src/diffusers/quantizers/pipe_quant_config.py
|
| Lines | L34-L207 |
| Import | from diffusers.quantizers import PipelineQuantizationConfig
|
| Type | API Doc |
| Principle | Huggingface_Diffusers_Pipeline_Level_Quantization |
| Implements | Principle:Huggingface_Diffusers_Pipeline_Level_Quantization |
Purpose
PipelineQuantizationConfig is the configuration class for applying quantization at the pipeline level during DiffusionPipeline.from_pretrained. It supports two modes: a global mode (same backend for all/selected components) and a granular mode (per-component config mapping). It resolves the appropriate quantization config for each component at loading time.
I/O Contract
Constructor Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
quant_backend |
None | None |
Quantization backend name (e.g., "bitsandbytes_4bit", "torchao"). Used for global mode.
|
quant_kwargs |
None | None |
Keyword arguments to initialize the backend's config class. Defaults to {}.
|
components_to_quantize |
str | None | None |
Component names to quantize in global mode. If None, all nn.Module components are quantized.
|
quant_mapping |
None | None |
Per-component config mapping for granular mode. |
Validation Rules
| Rule | Error |
|---|---|
Both quant_backend and quant_mapping provided |
ValueError
|
Neither quant_backend nor quant_mapping provided |
ValueError
|
Neither quant_kwargs nor quant_mapping provided |
ValueError
|
quant_backend not found in diffusers or transformers backends |
ValueError
|
Config classes in quant_mapping not found in any backend |
ValueError
|
| Init signatures mismatch between diffusers and transformers config classes (global mode) | ValueError
|
Constructor and Initialization
class PipelineQuantizationConfig:
def __init__(
self,
quant_backend: str = None,
quant_kwargs: dict[str, str | float | int | dict] = None,
components_to_quantize: list[str] | str | None = None,
quant_mapping: dict[str, DiffQuantConfigMixin | TransformersQuantConfigMixin] = None,
):
self.quant_backend = quant_backend
self.quant_kwargs = quant_kwargs or {}
if components_to_quantize:
if isinstance(components_to_quantize, str):
components_to_quantize = [components_to_quantize]
self.components_to_quantize = components_to_quantize
self.quant_mapping = quant_mapping
self.config_mapping = {} # book-keeping: {module_name: quant_config}
self.post_init()
def post_init(self):
self.is_granular = True if self.quant_mapping is not None else False
self._validate_init_args()
Key Methods
_resolve_quant_config
Called by load_sub_model() for each pipeline component. Returns the appropriate quantization config or None if the component should not be quantized.
def _resolve_quant_config(self, is_diffusers: bool = True, module_name: str = None):
quant_config_mapping_transformers, quant_config_mapping_diffusers = self._get_quant_config_list()
# Granular case: look up component in quant_mapping
if self.is_granular and module_name in self.quant_mapping:
config = self.quant_mapping[module_name]
self.config_mapping.update({module_name: config})
return config
# Global config case
else:
should_quantize = False
if self.components_to_quantize and module_name in self.components_to_quantize:
should_quantize = True
elif not self.is_granular and not self.components_to_quantize:
should_quantize = True # quantize all components
if should_quantize:
mapping_to_use = quant_config_mapping_diffusers if is_diffusers else quant_config_mapping_transformers
quant_config_cls = mapping_to_use[self.quant_backend]
quant_obj = quant_config_cls(**self.quant_kwargs)
self.config_mapping.update({module_name: quant_obj})
return quant_obj
return None # no quantization for this component
Control flow:
- Granular mode: If
quant_mappingcontains the component name, return its config directly. - Global mode with filter: If
components_to_quantizeis set and the component is in the list, create a new config instance fromquant_backend+quant_kwargs. - Global mode without filter: If no filter is set, quantize all components.
- Fallback: Return
None-- component is loaded without quantization.
_get_quant_config_list
Returns the backend config mappings from both diffusers and transformers:
def _get_quant_config_list(self):
if is_transformers_available():
from transformers.quantizers.auto import AUTO_QUANTIZATION_CONFIG_MAPPING as quant_config_mapping_transformers
else:
quant_config_mapping_transformers = None
from ..quantizers.auto import AUTO_QUANTIZATION_CONFIG_MAPPING as quant_config_mapping_diffusers
return quant_config_mapping_transformers, quant_config_mapping_diffusers
Integration Point: load_sub_model
In pipeline_loading_utils.py:L869-L878, the pipeline quantization config is resolved per-component:
# Inside load_sub_model():
if (
quantization_config is not None
and isinstance(quantization_config, PipelineQuantizationConfig)
and issubclass(class_obj, torch.nn.Module)
):
model_quant_config = quantization_config._resolve_quant_config(
is_diffusers=is_diffusers_model, module_name=name
)
if model_quant_config is not None:
loading_kwargs["quantization_config"] = model_quant_config
The resolved config is passed as quantization_config to the component's from_pretrained method, which then follows the standard model-level quantization loading flow.
Usage Examples
Global Mode: Same Backend for All Components
from diffusers import DiffusionPipeline
from diffusers.quantizers import PipelineQuantizationConfig
pipeline_quant_config = PipelineQuantizationConfig(
quant_backend="bitsandbytes_4bit",
quant_kwargs={"load_in_4bit": True, "bnb_4bit_quant_type": "nf4"},
)
pipe = DiffusionPipeline.from_pretrained(
"black-forest-labs/Flux.1-Dev",
quantization_config=pipeline_quant_config,
torch_dtype=torch.bfloat16,
)
Global Mode: Selective Components
from diffusers.quantizers import PipelineQuantizationConfig
pipeline_quant_config = PipelineQuantizationConfig(
quant_backend="torchao",
quant_kwargs={"quant_type": "int8wo"},
components_to_quantize=["transformer", "text_encoder"], # skip VAE
)
pipe = DiffusionPipeline.from_pretrained(
"black-forest-labs/Flux.1-Dev",
quantization_config=pipeline_quant_config,
torch_dtype=torch.bfloat16,
)
Granular Mode: Per-Component Configs
from diffusers import BitsAndBytesConfig, TorchAoConfig
from diffusers.quantizers import PipelineQuantizationConfig
import torch
pipeline_quant_config = PipelineQuantizationConfig(
quant_mapping={
"transformer": BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
),
"text_encoder": TorchAoConfig("int8wo"),
# text_encoder_2 and vae will not be quantized
}
)
pipe = DiffusionPipeline.from_pretrained(
"black-forest-labs/Flux.1-Dev",
quantization_config=pipeline_quant_config,
torch_dtype=torch.bfloat16,
)
Implementation Notes
- Signature validation in global mode: When using
quant_backend, the system compares the__init__signatures of the diffusers and transformers config classes for that backend. If they differ, it raises an error directing the user to usequant_mappinginstead. - Config objects vs. new instances: In granular mode, the user provides pre-constructed config objects. In global mode, new config instances are created on each
_resolve_quant_configcall usingquant_config_cls(**quant_kwargs). - Only nn.Module components: The
issubclass(class_obj, torch.nn.Module)check inload_sub_modelensures that only actual model components (not schedulers, tokenizers, etc.) are considered for quantization. - config_mapping for inspection: After pipeline loading,
pipeline_quant_config.config_mappingcontains the resolved config for each quantized component, enabling post-hoc inspection.
Related Pages
- Huggingface_Diffusers_Pipeline_Level_Quantization - Principle of heterogeneous per-component quantization
- Huggingface_Diffusers_Quantization_Config_Classes - The config objects used in quant_mapping
- Huggingface_Diffusers_ModelMixin_From_Pretrained_Quantized - How individual components consume the resolved config
- Huggingface_Diffusers_Quantized_Pipeline_Call - Running inference with the quantized pipeline
Requires Environment
Source References
src/diffusers/quantizers/pipe_quant_config.py:L34-L64- Constructor and post_initsrc/diffusers/quantizers/pipe_quant_config.py:L66-L86- Validation logicsrc/diffusers/quantizers/pipe_quant_config.py:L154-L187- _resolve_quant_config methodsrc/diffusers/quantizers/pipe_quant_config.py:L189-L199- _get_quant_config_list methodsrc/diffusers/pipelines/pipeline_loading_utils.py:L869-L878- Integration in load_sub_model