Implementation:Hiyouga LLaMA Factory NPU SwiGLU

Knowledge Sources	Hiyouga_LLaMA_Factory
Domains	Machine Learning, Hardware Acceleration, NPU
Last Updated	2026-02-06 19:00 GMT

Overview

NPU-optimized SwiGLU activation kernel that fuses gate projection, up projection, and SiLU activation into a single NPU-native operation for MLP layers across 20+ supported model architectures.

Description

npu_swiglu.py implements a hardware-accelerated SwiGLU activation replacement for Huawei NPU (Ascend) devices. The SwiGLU activation (SiLU(gate) * up) is a common pattern in modern LLMs like LLaMA, Qwen, and Gemma. This module fuses the operation using torch_npu.npu_swiglu for improved throughput.

The module provides:

npu_swiglu_forward: The default SwiGLU forward function that concatenates gate_proj and up_proj outputs, applies npu_swiglu, then passes through down_proj. Works for most architectures (LLaMA, Qwen2, Qwen3, DeepSeek, etc.).
_npu_swiglu_glm4_forward: Specialized variant for GLM4 and Phi3 architectures that use a fused gate_up_proj with chunk-based splitting.
_npu_swiglu_gemma3ntext_forward: Specialized variant for Gemma3nText that supports activation sparsity via gaussian_topk before the SwiGLU operation.
NpuSwiGluKernel: The registered kernel class (kernel_id: "npu_fused_swiglu") with:
- expect_modules: A frozenset of 21 supported MLP module class names including Qwen3MLP, LlamaMLP, Glm4MLP, Gemma3MLP, DeepseekV3MLP, and others.
- apply: Iterates over model modules, matches MLP layers by class name against expect_modules, and monkey-patches their forward methods with the appropriate kernel function using types.MethodType.

Usage

This kernel is automatically discovered and registered by the kernel interface. It is applied when running SwiGLU-based models (LLaMA, Qwen, Gemma, GLM4, DeepSeek, etc.) on NPU hardware. The kernel is only applied to MLP modules whose class names are in the expect_modules set.

Code Reference

Source Location

Repository: Hiyouga_LLaMA_Factory
File: src/llamafactory/v1/plugins/model_plugins/kernels/ops/mlp/npu_swiglu.py
Lines: 1-168

Signature

def npu_swiglu_forward(self, hidden_state) -> torch.Tensor

@register_kernel
class NpuSwiGluKernel(BaseKernel):
    expect_modules = frozenset({
        "Qwen3MLP", "Qwen2MLP", "LlamaMLP", "Glm4MLP",
        "Gemma3MLP", "DeepseekV3MLP", "Phi3MLP", ...
    })
    _kernel_id = "npu_fused_swiglu"
    _device = DeviceType.NPU

    @classmethod
    def apply(cls, **kwargs) -> HFModel

Import

from llamafactory.v1.plugins.model_plugins.kernels.ops.mlp.npu_swiglu import NpuSwiGluKernel

I/O Contract

Inputs

NpuSwiGluKernel.apply

Name	Type	Required	Description
model	HFModel (via kwargs)	Yes	The HuggingFace model instance containing MLP modules to patch

npu_swiglu_forward

Name	Type	Required	Description
hidden_state	torch.Tensor	Yes	Input hidden state tensor from the attention layer output

Outputs

NpuSwiGluKernel.apply

Name	Type	Description
model	HFModel	The model with MLP forward methods monkey-patched to use NPU fused SwiGLU

npu_swiglu_forward

Name	Type	Description
output	torch.Tensor	Output of down_proj(npu_swiglu(cat(gate_proj(x), up_proj(x))))

Usage Examples

# Automatic application via kernel interface
from llamafactory.v1.plugins.model_plugins.kernels.interface import apply_kernel

apply_kernel("npu_fused_swiglu", model=model)

# Direct application
from llamafactory.v1.plugins.model_plugins.kernels.ops.mlp.npu_swiglu import NpuSwiGluKernel

NpuSwiGluKernel.apply(model=model)

Related Pages

Hiyouga_LLaMA_Factory_Kernel_Interface - Kernel discovery and registration interface that manages this kernel
Hiyouga_LLaMA_Factory_NPU_Fused_MoE - Related NPU MoE kernel for MoE-based MLP layers
Hiyouga_LLaMA_Factory_NPU_RoPE - Related NPU RoPE kernel for attention layers

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment