Implementation:Ollama Ollama Convert GptOss

Knowledge Sources	Ollama
Domains	Model Conversion, GGUF Format
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the GPT-OSS MoE architecture, handling MXFP4 quantized tensor conversion, interleaved gate-up expert splitting, and dual replacement strategies for HuggingFace vs. native tensor layouts.

Description

The gptossModel struct implements ModelConverter for GPT-OSS models that use sliding window attention and MoE experts. A unique feature is the mxfp4 type that handles MXFP4 (Microscaling FP4) quantized tensors, combining block and scale tensors with a nibble reordering transform. The Tensors method handles three tensor categories: MXFP4 quantized expert tensors (blocks + scales combined with nibble interleaving), interleaved gate_up_exps tensors (split into separate gate and up expert tensors using stride-2 slicing), and regular tensors. The Replacements method provides dual replacement strategies depending on whether the model uses HuggingFace or native naming conventions.

Usage

Invoked automatically when the model's architecture matches GptOssForCausalLM or similar GPT-OSS architecture identifiers.

Code Reference

Source Location

Repository: Ollama
File: convert/convert_gptoss.go
Lines: 1-269

Signature

type gptossModel struct {
    ModelParameters
    HiddenLayers    uint32  `json:"num_hidden_layers"`
    Experts         uint32  `json:"num_experts"`
    ExpertsPerToken uint32  `json:"experts_per_token"`
    SlidingWindow   uint32  `json:"sliding_window"`
    RopeScalingFactor float32 `json:"rope_scaling_factor"`
}

type mxfp4 struct {
    blocks, scales Tensor
}

func (m *gptossModel) KV(t *Tokenizer) KV
func (m *gptossModel) Tensors(ts []Tensor) []*ggml.Tensor
func (m *gptossModel) Replacements() []string
func (m *mxfp4) WriteTo(w io.Writer) (int64, error)

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name	Type	Required	Description
t	*Tokenizer	Yes	Tokenizer data for GGUF metadata
ts	[]Tensor	Yes	Source tensors including MXFP4 blocks/scales and interleaved expert weights

Outputs

Name	Type	Description
KV	KV	GGUF metadata with gptoss.* keys including custom EOS token IDs
[]*ggml.Tensor	slice	Converted tensors with MXFP4 encoding and split gate/up experts

Usage Examples

// Converter registered for GPT-OSS architecture
// MXFP4 tensors are assembled from .blocks and .scales components
// with nibble reordering: a1b2c3...x7y8z9 -> 71xa82yb93zc
// gate_up_exps are split via stride-2 slicing

Related Pages

Principle:Ollama_Ollama_GGUF_Model_Conversion_GptOss

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment