Implementation:Ollama Ollama Convert GptOss
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, GGUF Format |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GGUF model converter for the GPT-OSS MoE architecture, handling MXFP4 quantized tensor conversion, interleaved gate-up expert splitting, and dual replacement strategies for HuggingFace vs. native tensor layouts.
Description
The gptossModel struct implements ModelConverter for GPT-OSS models that use sliding window attention and MoE experts. A unique feature is the mxfp4 type that handles MXFP4 (Microscaling FP4) quantized tensors, combining block and scale tensors with a nibble reordering transform. The Tensors method handles three tensor categories: MXFP4 quantized expert tensors (blocks + scales combined with nibble interleaving), interleaved gate_up_exps tensors (split into separate gate and up expert tensors using stride-2 slicing), and regular tensors. The Replacements method provides dual replacement strategies depending on whether the model uses HuggingFace or native naming conventions.
Usage
Invoked automatically when the model's architecture matches GptOssForCausalLM or similar GPT-OSS architecture identifiers.
Code Reference
Source Location
- Repository: Ollama
- File: convert/convert_gptoss.go
- Lines: 1-269
Signature
type gptossModel struct {
ModelParameters
HiddenLayers uint32 `json:"num_hidden_layers"`
Experts uint32 `json:"num_experts"`
ExpertsPerToken uint32 `json:"experts_per_token"`
SlidingWindow uint32 `json:"sliding_window"`
RopeScalingFactor float32 `json:"rope_scaling_factor"`
}
type mxfp4 struct {
blocks, scales Tensor
}
func (m *gptossModel) KV(t *Tokenizer) KV
func (m *gptossModel) Tensors(ts []Tensor) []*ggml.Tensor
func (m *gptossModel) Replacements() []string
func (m *mxfp4) WriteTo(w io.Writer) (int64, error)
Import
import "github.com/ollama/ollama/convert"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| t | *Tokenizer | Yes | Tokenizer data for GGUF metadata |
| ts | []Tensor | Yes | Source tensors including MXFP4 blocks/scales and interleaved expert weights |
Outputs
| Name | Type | Description |
|---|---|---|
| KV | KV | GGUF metadata with gptoss.* keys including custom EOS token IDs |
| []*ggml.Tensor | slice | Converted tensors with MXFP4 encoding and split gate/up experts |
Usage Examples
// Converter registered for GPT-OSS architecture
// MXFP4 tensors are assembled from .blocks and .scales components
// with nibble reordering: a1b2c3...x7y8z9 -> 71xa82yb93zc
// gate_up_exps are split via stride-2 slicing