Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Model BailingMoE2

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Model Architecture
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the ggml computation graph builder for the second-generation BailingMoE2 model architecture.

Description

The llm_build_bailingmoe2 constructor builds a graph for the BailingMoE2 model which uses fused QKV projection (wqkv) split via tensor views, Q/K normalization with RMS norm, RoPE-based positional encoding, and MoE feed-forward layers. It separates transformer layers from next-token prediction layers based on hparams.nextn_predict_layers, supporting speculative decoding architectures.

Usage

Enables Ollama to run BailingMoE2 models through the llama.cpp inference engine, supporting the model's fused attention and speculative decoding architecture.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/models/bailingmoe2.cpp
  • Lines: 1-135

Signature

llm_build_bailingmoe2::llm_build_bailingmoe2(
    const llama_model & model,
    const llm_graph_params & params) : llm_graph_context(params);

Import

#include "models.h"

I/O Contract

Inputs

Name Type Required Description
model const llama_model & Yes Loaded model with BailingMoE2 weights
params const llm_graph_params & Yes Graph construction parameters

Outputs

Name Type Description
ggml graph ggml_cgraph Complete BailingMoE2 computation graph with nextn prediction

Usage Examples

auto builder = llm_build_bailingmoe2(model, params);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment