Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Hiyouga LLaMA Factory NPU SwiGLU

From Leeroopedia


Knowledge Sources
Domains Machine Learning, Hardware Acceleration, NPU
Last Updated 2026-02-06 19:00 GMT

Overview

NPU-optimized SwiGLU activation kernel that fuses gate projection, up projection, and SiLU activation into a single NPU-native operation for MLP layers across 20+ supported model architectures.

Description

npu_swiglu.py implements a hardware-accelerated SwiGLU activation replacement for Huawei NPU (Ascend) devices. The SwiGLU activation (SiLU(gate) * up) is a common pattern in modern LLMs like LLaMA, Qwen, and Gemma. This module fuses the operation using torch_npu.npu_swiglu for improved throughput.

The module provides:

  • npu_swiglu_forward: The default SwiGLU forward function that concatenates gate_proj and up_proj outputs, applies npu_swiglu, then passes through down_proj. Works for most architectures (LLaMA, Qwen2, Qwen3, DeepSeek, etc.).
  • _npu_swiglu_glm4_forward: Specialized variant for GLM4 and Phi3 architectures that use a fused gate_up_proj with chunk-based splitting.
  • _npu_swiglu_gemma3ntext_forward: Specialized variant for Gemma3nText that supports activation sparsity via gaussian_topk before the SwiGLU operation.
  • NpuSwiGluKernel: The registered kernel class (kernel_id: "npu_fused_swiglu") with:
    • expect_modules: A frozenset of 21 supported MLP module class names including Qwen3MLP, LlamaMLP, Glm4MLP, Gemma3MLP, DeepseekV3MLP, and others.
    • apply: Iterates over model modules, matches MLP layers by class name against expect_modules, and monkey-patches their forward methods with the appropriate kernel function using types.MethodType.

Usage

This kernel is automatically discovered and registered by the kernel interface. It is applied when running SwiGLU-based models (LLaMA, Qwen, Gemma, GLM4, DeepSeek, etc.) on NPU hardware. The kernel is only applied to MLP modules whose class names are in the expect_modules set.

Code Reference

Source Location

Signature

def npu_swiglu_forward(self, hidden_state) -> torch.Tensor

@register_kernel
class NpuSwiGluKernel(BaseKernel):
    expect_modules = frozenset({
        "Qwen3MLP", "Qwen2MLP", "LlamaMLP", "Glm4MLP",
        "Gemma3MLP", "DeepseekV3MLP", "Phi3MLP", ...
    })
    _kernel_id = "npu_fused_swiglu"
    _device = DeviceType.NPU

    @classmethod
    def apply(cls, **kwargs) -> HFModel

Import

from llamafactory.v1.plugins.model_plugins.kernels.ops.mlp.npu_swiglu import NpuSwiGluKernel

I/O Contract

Inputs

NpuSwiGluKernel.apply

Name Type Required Description
model HFModel (via kwargs) Yes The HuggingFace model instance containing MLP modules to patch

npu_swiglu_forward

Name Type Required Description
hidden_state torch.Tensor Yes Input hidden state tensor from the attention layer output

Outputs

NpuSwiGluKernel.apply

Name Type Description
model HFModel The model with MLP forward methods monkey-patched to use NPU fused SwiGLU

npu_swiglu_forward

Name Type Description
output torch.Tensor Output of down_proj(npu_swiglu(cat(gate_proj(x), up_proj(x))))

Usage Examples

# Automatic application via kernel interface
from llamafactory.v1.plugins.model_plugins.kernels.interface import apply_kernel

apply_kernel("npu_fused_swiglu", model=model)

# Direct application
from llamafactory.v1.plugins.model_plugins.kernels.ops.mlp.npu_swiglu import NpuSwiGluKernel

NpuSwiGluKernel.apply(model=model)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment