Principle:Huggingface Optimum Model Decomposition
| Field | Value |
|---|---|
| Page Type | Principle |
| Source Repository | https://github.com/huggingface/optimum |
| Domains | NLP, Computer_Vision, Export |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Model Decomposition is the technique for splitting complex multi-component models into individually exportable sub-models, each with its own independent export configuration. This enables the export of architectures that cannot be represented as a single computation graph.
Description
Many modern model architectures are not monolithic. They consist of multiple distinct components that must be exported separately:
- Encoder-decoder models (T5, BART, Marian) have separate encoder and decoder components. The encoder processes input once, while the decoder is called iteratively during generation. These are split into:
encoder-- The encoder sub-modeldecoder-- The decoder sub-model (without past key-values as input)decoder_with_past-- The decoder sub-model (with past key-values as input, for efficient generation)
- Decoder-only models (GPT-2, LLaMA) may need separate past/no-past variants when KV-cache is enabled. These produce:
model-- The full decoder model with appropriate past key-value handling
- Diffusion pipelines (Stable Diffusion, SDXL, Stable Diffusion 3) consist of multiple independently trained sub-models:
text_encoder-- Encodes text prompts (e.g., CLIP text encoder)text_encoder_2-- Second text encoder (for SDXL and similar)text_encoder_3-- Third text encoder (for SD3 and similar)unet-- The denoising UNet (for UNet-based architectures)transformer-- The denoising transformer (for DiT-based architectures)vae_encoder-- The VAE encoder for image-to-latent conversionvae_decoder-- The VAE decoder for latent-to-image conversion
Each component receives its own ExporterConfig instance, tailored to its specific input/output signature. The decomposition function returns a dictionary mapping component names to (model, config) tuples.
Usage
Use Model Decomposition when exporting:
- Encoder-decoder models for sequence-to-sequence tasks
- Decoder-only models with KV-cache optimization
- Diffusion pipelines for image generation
- Any multi-component architecture (e.g., MusicGen with text encoder, audio encoder, and decoder)
Model Decomposition is invoked after task resolution and export configuration construction, but before the actual export step. The export function iterates over the decomposed components and exports each one individually.
Theoretical Basis
Model Decomposition uses the Component Extraction Pattern. Each architecture type has a known decomposition strategy:
- Encoder-decoder split -- Uses
model.get_encoder()to extract the encoder sub-model. The full model serves as the decoder (with the encoder output fed as cross-attention input). TheExporterConfig.with_behavior()method creates component-specific configurations.
- Decoder-only decomposition -- Creates a new
ExporterConfiginstance with appropriateuse_pastanduse_past_in_inputssettings to control KV-cache behavior.
- Diffusion pipeline enumeration -- Iterates over known sub-model attributes (
text_encoder,unet,vae, etc.) using the pipeline's registered components. Each sub-model gets its own export config looked up viaTasksManager.get_exporter_config_constructor.
The decomposition strategy is determined by the model's architecture and the export configuration class, ensuring that each sub-model is exported with the correct input/output specification.
Related Pages
- Implemented by: Implementation:Huggingface_Optimum_Model_Decomposition_Utils