Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Mit han lab Llm awq Make quant norm

From Leeroopedia
Revision as of 13:16, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Mit_han_lab_Llm_awq_Make_quant_norm.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

Concrete tool for replacing LlamaRMSNorm with CUDA-accelerated FTLlamaRMSNorm in TinyChat models provided by the llm-awq library.

Source

File: tinychat/modules/fused_norm.py, Lines 24-46

Signature

def make_quant_norm(model):
    ...

Import

from tinychat.modules import make_quant_norm

I/O

Inputs

  • model (nn.Module) - the model to modify

Output

  • None (model is modified in-place)

Details

  • Replaces all LlamaRMSNorm instances with FTLlamaRMSNorm
  • Uses awq_inference_engine.layernorm_forward_cuda for the fused CUDA kernel

Related Pages

Knowledge Sources

Domains

  • Inference
  • Optimization

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment