Implementation:Ollama Ollama Llama Model GLM4

Knowledge Sources	Ollama
Domains	LLM Inference, Model Architecture
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the ggml computation graph builder for the GLM-4 (ChatGLM4) model architecture.

Description

The llm_build_glm4 constructor builds a graph supporting both separate Q/K/V projections and fused QKV projection, RMS-normalized self-attention with optional biases, and both standard RoPE and M-RoPE (multi-dimensional rotary position embedding with sections) for multimodal use cases. Includes validation that multimodal inputs require M-RoPE support in the GGUF file.

Usage

Enables Ollama to run GLM-4 family models through the llama.cpp inference engine, including multimodal variants that use M-RoPE for vision-language tasks.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/models/glm4.cpp
Lines: 1-150

Signature

llm_build_glm4::llm_build_glm4(
    const llama_model & model,
    const llm_graph_params & params) : llm_graph_context(params);

Import

#include "models.h"

I/O Contract

Inputs

Name	Type	Required	Description
model	const llama_model &	Yes	Loaded model with GLM-4 weights
params	const llm_graph_params &	Yes	Graph construction parameters

Outputs

Name	Type	Description
ggml graph	ggml_cgraph	Complete GLM-4 computation graph with M-RoPE support

Usage Examples

auto builder = llm_build_glm4(model, params);

Related Pages

Principle:Ollama_Ollama_LLM_Inference_Pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment