Implementation:Mit han lab Llm awq Pseudo quantize model weight
Appearance
Overview
Concrete tool for applying simulated quantization to all model weights provided by the llm-awq library.
Source
File: awq/quantize/quantizer.py, Lines: 106-123
Signature
@torch.no_grad()
def pseudo_quantize_model_weight(model, w_bit, q_config):
Import
from awq.quantize.quantizer import pseudo_quantize_model_weight
I/O
Inputs:
- model (nn.Module) - the model to apply simulated quantization to
- w_bit (int) - bit width for quantization
- q_config (dict) - quantization configuration
Output:
- None (model weights modified in-place with simulated quantization noise)
Notes
Iterates over all transformer blocks and applies pseudo_quantize_tensor to each linear layer's weights.
Related Pages
- Principle:Mit_han_lab_Llm_awq_Pseudo_Quantization
- Environment:Mit_han_lab_Llm_awq_Python_Runtime_Environment
Knowledge Sources
- Repo|llm-awq|https://github.com/mit-han-lab/llm-awq
Domains
- Quantization
- Evaluation
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment