Principle:LLMBook zh LLMBook zh github io Mixture of Experts Routing

Knowledge Sources	Switch Transformers: Scaling to Trillion Parameter Models Mixtral of Experts LLMBook-zh
Domains	Deep_Learning, Model_Architecture
Last Updated	2026-02-08 04:29 GMT

Overview

Sparse architecture pattern that routes each token to a subset of expert networks via a gating function, scaling model capacity without proportional compute increase.

Description

Mixture of Experts (MoE) is a conditional computation technique that replaces a single feed-forward network with multiple parallel expert networks and a routing (gating) mechanism. For each input token, the gating network selects the top-k experts (typically k=1 or k=2) and computes a weighted combination of their outputs. This allows the model to have a very large number of parameters (capacity) while only activating a fraction of them for each token (efficiency). MoE is the architecture behind models like Mixtral, Switch Transformer, and GShard.

Usage

Use this principle when studying scalable Transformer architectures that achieve high parameter counts without proportional increases in computation. MoE layers typically replace the dense feed-forward (MLP) layers in Transformer blocks. The routing decision is made per-token, so different tokens in the same batch may be processed by different experts.

Theoretical Basis

The MoE layer computes:

$MoE (x) = \sum_{i \in TopK} w_{i} \cdot E_{i} (x)$

Where:

$E_{i}$ is the $i$ -th expert network (typically a standard FFN)
The routing weights are: $w = softmax (TopK (G (x)))$
$G (x) = x \cdot W_{g}$ is the gating network (a linear projection)
TopK selects the $k$ experts with highest gate logits

Pseudo-code Logic:

# Abstract algorithm description (NOT real implementation)
gate_logits = gate(input)                      # linear projection
weights, selected = topk(gate_logits, k)       # select top-k experts
weights = softmax(weights)                     # normalize weights
output = sum(w_i * expert_i(input) for i in selected)

Related Pages

Implementation:LLMBook_zh_LLMBook_zh_github_io_MoeLayer

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment