Implementation:Mit han lab Llm awq Auto clip block
Appearance
Overview
Concrete tool for finding optimal weight clipping values within a transformer block provided by the llm-awq library.
Source
File: awq/quantize/auto_clip.py, Lines: 67-83
Signature
@torch.no_grad()
def auto_clip_block(module, w_bit, q_config, input_feat):
Import
from awq.quantize.auto_clip import auto_clip_block
I/O
Inputs:
- module (nn.Module) - transformer block
- w_bit (int) - bit width for quantization
- q_config (dict) - quantization configuration
- input_feat (dict) - cached activations for each layer
Output:
- list of (name: str, max_val: torch.Tensor) tuples
Notes
Internally calls auto_clip_layer() for each linear layer, skipping layers whose names contain q_, k_, query, key, or Wqkv.
Related Pages
- Principle:Mit_han_lab_Llm_awq_Weight_Clipping_Optimization
- Environment:Mit_han_lab_Llm_awq_Python_Runtime_Environment
- Heuristic:Mit_han_lab_Llm_awq_AWQ_Grid_Search_Tuning
- Heuristic:Mit_han_lab_Llm_awq_GPU_Memory_Management_Patterns
- Heuristic:Mit_han_lab_Llm_awq_Skip_QK_Projection_Clipping
Knowledge Sources
- Repo|llm-awq|https://github.com/mit-han-lab/llm-awq
Domains
- Quantization
- Optimization
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment