Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory V1 NPU RMSNorm

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Hardware Optimization
Last Updated 2026-02-06 19:00 GMT

Overview

NpuRMSNormKernel is a concrete kernel implementation that replaces standard RMSNorm forward methods with NPU-optimized versions using torch_npu.npu_rms_norm.

Description

This module implements the NpuRMSNormKernel class which extends BaseKernel to provide NPU-native RMS normalization. It defines a standalone npu_rms_norm_forward function that calls torch_npu.npu_rms_norm directly, and the kernel's apply method iterates over all model modules, matching any whose class name contains "RMSNorm" (case-insensitive) via regex. Matched modules have their forward method replaced using types.MethodType to bind the NPU-optimized function as an instance method. The kernel is automatically registered via the @register_kernel decorator.

Usage

This kernel is applied automatically when running on NPU hardware and the kernel system is configured to use npu_fused_rmsnorm. It requires torch_npu to be installed and an NPU accelerator to be active. No manual invocation is needed in typical workflows.

Code Reference

Source Location

Signature

def npu_rms_norm_forward(self, hidden_states) -> Tensor: ...

@register_kernel
class NpuRMSNormKernel(BaseKernel):
    _kernel_id = "npu_fused_rmsnorm"
    _device = DeviceType.NPU

    @classmethod
    def apply(cls, **kwargs) -> HFModel: ...

Import

from llamafactory.v1.plugins.model_plugins.kernels.ops.rms_norm.npu_rms_norm import NpuRMSNormKernel

I/O Contract

Inputs

Name Type Required Description
hidden_states (npu_rms_norm_forward) Tensor Yes Input hidden states tensor to be normalized
self (npu_rms_norm_forward) Module Yes RMSNorm module instance with weight and variance_epsilon attributes
model (apply kwarg) HFModel Yes HuggingFace model instance whose RMSNorm modules will be patched

Outputs

Name Type Description
npu_rms_norm_forward Tensor Normalized tensor consistent with baseline RMSNorm behavior
apply HFModel The model with all RMSNorm modules patched to use NPU-optimized forward

Usage Examples

# Automatic registration and application via the kernel system
from llamafactory.v1.plugins.model_plugins.kernels.ops.rms_norm.npu_rms_norm import NpuRMSNormKernel

# The kernel is auto-registered via @register_kernel on import.
# Typical usage is through the kernel interface:
from llamafactory.v1.plugins.model_plugins.kernels.interface import apply_kernel

model = apply_kernel(model=model, kernel_id="npu_fused_rmsnorm")

# Direct application (less common):
if NpuRMSNormKernel.check_deps():
    model = NpuRMSNormKernel.apply(model=model)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment