Principle:Huggingface Optimum Model Decomposition

Field	Value
Page Type	Principle
Source Repository	https://github.com/huggingface/optimum
Domains	NLP, Computer_Vision, Export
Last Updated	2026-02-15 00:00 GMT

Overview

Model Decomposition is the technique for splitting complex multi-component models into individually exportable sub-models, each with its own independent export configuration. This enables the export of architectures that cannot be represented as a single computation graph.

Description

Many modern model architectures are not monolithic. They consist of multiple distinct components that must be exported separately:

Encoder-decoder models (T5, BART, Marian) have separate encoder and decoder components. The encoder processes input once, while the decoder is called iteratively during generation. These are split into:
- encoder -- The encoder sub-model
- decoder -- The decoder sub-model (without past key-values as input)
- decoder_with_past -- The decoder sub-model (with past key-values as input, for efficient generation)

Decoder-only models (GPT-2, LLaMA) may need separate past/no-past variants when KV-cache is enabled. These produce:
- model -- The full decoder model with appropriate past key-value handling

Diffusion pipelines (Stable Diffusion, SDXL, Stable Diffusion 3) consist of multiple independently trained sub-models:
- text_encoder -- Encodes text prompts (e.g., CLIP text encoder)
- text_encoder_2 -- Second text encoder (for SDXL and similar)
- text_encoder_3 -- Third text encoder (for SD3 and similar)
- unet -- The denoising UNet (for UNet-based architectures)
- transformer -- The denoising transformer (for DiT-based architectures)
- vae_encoder -- The VAE encoder for image-to-latent conversion
- vae_decoder -- The VAE decoder for latent-to-image conversion

Each component receives its own ExporterConfig instance, tailored to its specific input/output signature. The decomposition function returns a dictionary mapping component names to (model, config) tuples.

Usage

Use Model Decomposition when exporting:

Encoder-decoder models for sequence-to-sequence tasks
Decoder-only models with KV-cache optimization
Diffusion pipelines for image generation
Any multi-component architecture (e.g., MusicGen with text encoder, audio encoder, and decoder)

Model Decomposition is invoked after task resolution and export configuration construction, but before the actual export step. The export function iterates over the decomposed components and exports each one individually.

Theoretical Basis

Model Decomposition uses the Component Extraction Pattern. Each architecture type has a known decomposition strategy:

Encoder-decoder split -- Uses model.get_encoder() to extract the encoder sub-model. The full model serves as the decoder (with the encoder output fed as cross-attention input). The ExporterConfig.with_behavior() method creates component-specific configurations.

Decoder-only decomposition -- Creates a new ExporterConfig instance with appropriate use_past and use_past_in_inputs settings to control KV-cache behavior.

Diffusion pipeline enumeration -- Iterates over known sub-model attributes (text_encoder, unet, vae, etc.) using the pipeline's registered components. Each sub-model gets its own export config looked up via TasksManager.get_exporter_config_constructor.

The decomposition strategy is determined by the model's architecture and the export configuration class, ensuring that each sub-model is exported with the correct input/output specification.

Related Pages

Implemented by: Implementation:Huggingface_Optimum_Model_Decomposition_Utils

Connections

Implementation:Huggingface_Optimum_Model_Decomposition_Utils

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment