Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mit han lab Llm awq WQLinear from linear

From Leeroopedia

Overview

Concrete tool for creating INT4 quantized linear layers from standard linear layers provided by the llm-awq library.

Source

File: awq/quantize/qmodule.py, Lines 139-199 (from_linear classmethod), Lines 78-235 (full class)

Signature

class WQLinear(nn.Module):
    def __init__(self, w_bit, group_size, in_features, out_features, bias, dev, dtype=torch.float16):
        ...

    @classmethod
    def from_linear(cls, linear, w_bit, group_size, init_only=False, scales=None, zeros=None):
        ...

Import

from awq.quantize.qmodule import WQLinear

I/O

Inputs (from_linear)

  • linear (nn.Linear) - the standard linear layer to quantize
  • w_bit (int) - weight bit width, must be 4
  • group_size (int) - quantization group size, typically 128
  • init_only (bool) - if True, create empty shell without packing weights
  • scales (torch.Tensor, optional) - pre-computed quantization scales
  • zeros (torch.Tensor, optional) - pre-computed quantization zeros

Output

  • WQLinear instance with packed qweight, scales, and scaled_zeros buffers

Forward Method

The forward method dispatches to GEMV for batch size < 8 and GEMM otherwise, both via awq_inference_engine.

Related Pages

Knowledge Sources

Domains

  • Quantization
  • Inference

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment