Principle:VainF Torch Pruning LLM Config Update
Metadata
| Field | Value |
|---|---|
| Domains | NLP, Model_Compression, Pruning |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Post-pruning synchronization of HuggingFace model configuration attributes with the physically modified weight dimensions.
Description
After structural pruning of an LLM, the weight tensors have been physically resized (channels removed), but the model's configuration object (model.config) still contains the original dimensions. If saved as-is, loading the model would fail due to shape mismatches.
The configuration update pattern iterates through all modules to discover the new dimensions from the actual weight shapes and updates model.config accordingly:
- hidden_size -- the embedding and output projection dimension
- num_attention_heads -- the number of query attention heads
- num_key_value_heads -- the number of key/value heads (for Grouped Query Attention)
- intermediate_size -- the MLP hidden dimension
This step is mandatory between pruning and saving for HuggingFace-compatible models.
Usage
Required after pruning any HuggingFace LLM (Llama, Phi, Qwen, etc.) and before calling model.save_pretrained(). Without this step, the saved model cannot be reloaded.
Theoretical Basis
After pruning, the new configuration values are derived directly from the physical weight shapes:
- hidden_size:
model.lm_head.in_featuresgives the new hidden_size. - num_attention_heads: For attention modules,
new num_heads = hidden_size / head_dim. Thehead_dimis preserved whenprune_num_heads=True. - intermediate_size:
- For separate gate/up projections:
intermediate_size = gate_proj.out_features - For fused gate_up projections:
intermediate_size = gate_up_proj.out_features // 2
- For separate gate/up projections:
- num_key_value_heads (GQA): Updated separately from
num_attention_heads, derived fromk_proj.out_features // head_dim.