Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mit han lab Llm awq WQLinear from linear

From Leeroopedia

Overview

Concrete tool for creating INT4 quantized linear layers from standard linear layers provided by the llm-awq library.

Source

File: awq/quantize/qmodule.py, Lines 139-199 (from_linear classmethod), Lines 78-235 (full class)

Signature

class WQLinear(nn.Module):
    def __init__(self, w_bit, group_size, in_features, out_features, bias, dev, dtype=torch.float16):
        ...

    @classmethod
    def from_linear(cls, linear, w_bit, group_size, init_only=False, scales=None, zeros=None):
        ...

Import

from awq.quantize.qmodule import WQLinear

I/O

Inputs (from_linear)

linear (nn.Linear) - the standard linear layer to quantize
w_bit (int) - weight bit width, must be 4
group_size (int) - quantization group size, typically 128
init_only (bool) - if True, create empty shell without packing weights
scales (torch.Tensor, optional) - pre-computed quantization scales
zeros (torch.Tensor, optional) - pre-computed quantization zeros

Output

WQLinear instance with packed qweight, scales, and scaled_zeros buffers

Forward Method

The forward method dispatches to GEMV for batch size < 8 and GEMM otherwise, both via awq_inference_engine.

Related Pages

Knowledge Sources

Repo|llm-awq|https://github.com/mit-han-lab/llm-awq

Domains

Quantization
Inference

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Retrieved from "https://leeroopedia.com/index.php?title=Implementation:Mit_han_lab_Llm_awq_WQLinear_from_linear&oldid=8458"

Implementations