Implementation:Ollama Ollama Llama Model GLM4
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Model Architecture |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the ggml computation graph builder for the GLM-4 (ChatGLM4) model architecture.
Description
The llm_build_glm4 constructor builds a graph supporting both separate Q/K/V projections and fused QKV projection, RMS-normalized self-attention with optional biases, and both standard RoPE and M-RoPE (multi-dimensional rotary position embedding with sections) for multimodal use cases. Includes validation that multimodal inputs require M-RoPE support in the GGUF file.
Usage
Enables Ollama to run GLM-4 family models through the llama.cpp inference engine, including multimodal variants that use M-RoPE for vision-language tasks.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/models/glm4.cpp - Lines: 1-150
Signature
llm_build_glm4::llm_build_glm4(
const llama_model & model,
const llm_graph_params & params) : llm_graph_context(params);
Import
#include "models.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | const llama_model & | Yes | Loaded model with GLM-4 weights |
| params | const llm_graph_params & | Yes | Graph construction parameters |
Outputs
| Name | Type | Description |
|---|---|---|
| ggml graph | ggml_cgraph | Complete GLM-4 computation graph with M-RoPE support |
Usage Examples
auto builder = llm_build_glm4(model, params);