Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mit han lab Llm awq Pseudo quantize model weight

From Leeroopedia

Overview

Concrete tool for applying simulated quantization to all model weights provided by the llm-awq library.

Source

File: awq/quantize/quantizer.py, Lines: 106-123

Signature

@torch.no_grad()
def pseudo_quantize_model_weight(model, w_bit, q_config):

Import

from awq.quantize.quantizer import pseudo_quantize_model_weight

I/O

Inputs:

  • model (nn.Module) - the model to apply simulated quantization to
  • w_bit (int) - bit width for quantization
  • q_config (dict) - quantization configuration

Output:

  • None (model weights modified in-place with simulated quantization noise)

Notes

Iterates over all transformer blocks and applies pseudo_quantize_tensor to each linear layer's weights.

Related Pages

Knowledge Sources

Domains

  • Quantization
  • Evaluation

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment