Implementation:Mit han lab Llm awq Make quant norm
Appearance
Overview
Concrete tool for replacing LlamaRMSNorm with CUDA-accelerated FTLlamaRMSNorm in TinyChat models provided by the llm-awq library.
Source
File: tinychat/modules/fused_norm.py, Lines 24-46
Signature
def make_quant_norm(model):
...
Import
from tinychat.modules import make_quant_norm
I/O
Inputs
- model (nn.Module) - the model to modify
Output
- None (model is modified in-place)
Details
- Replaces all LlamaRMSNorm instances with FTLlamaRMSNorm
- Uses awq_inference_engine.layernorm_forward_cuda for the fused CUDA kernel
Related Pages
- Principle:Mit_han_lab_Llm_awq_Fused_Normalization
- Environment:Mit_han_lab_Llm_awq_CUDA_Build_Environment
Knowledge Sources
- Repo|llm-awq|https://github.com/mit-han-lab/llm-awq
Domains
- Inference
- Optimization
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment