Principle:Ollama Ollama GGUF Model Conversion GptOss
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, GPT |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
GPT-OSS conversion handles an open-source GPT variant architecture featuring SwiGLU activations, YaRN RoPE scaling, Mixture-of-Experts with MXFP4 quantized expert weights, and sliding window attention, supporting both HuggingFace-flavored and native model formats in the transformation to GGUF.
Core Concepts
Tensor Name Mapping
The converter supports two naming schemes depending on the model flavor:
HuggingFace flavor (when max_position_embeddings > 0):
lm_head->outputmodel.embed_tokens->token_embdmodel.layers->blkmodel.norm->output_normself_attn.{q,k,v}_proj->attn_{q,k,v}self_attn.o_proj->attn_outself_attn.sinks->attn_sinksmlp.router->ffn_gate_inpmlp.experts.gate_up_proj_->ffn_gate_up_exps.mlp.experts.down_proj_->ffn_down_exps.
Native flavor:
block->blkembedding->token_embdunembedding->outputattn.qkv->attn_qkvmlp.gate->ffn_gate_inpmlp.mlp1_->ffn_gate_up_exps.mlp.mlp2_->ffn_down_exps.
Architecture-Specific Hyperparameters
The GGUF metadata is written under the gptoss.* namespace:
gptoss.context_length-- derived frommax_position_embeddingsorrope_scaling_factor * initial_context_lengthgptoss.expert_count-- fromnum_expertsornum_local_expertsgptoss.expert_used_count-- experts per tokengptoss.attention.key_length/value_length-- explicit head dimensiongptoss.attention.sliding_window-- sliding window sizegptoss.rope.freq_base-- RoPE thetagptoss.rope.scaling.factor/original_context_length-- YaRN scalinggeneral.file_type-- set to 4
Special Handling
MXFP4 Expert Weight Handling
Expert weights may arrive as MXFP4 (Microscaling FP4) format with separate .blocks and .scales tensors. The converter pairs these together, performs a byte-level transformation to rearrange nibbles (interleaving high/low 4-bit values), concatenates scales with blocks along dimension 3, and outputs the result with TensorTypeMXFP4.
Gate-Up Expert Splitting
Interleaved gate_up_exps tensors are split into separate gate_exps and up_exps by striding along the expert dimension (even indices for gate, odd for up). This applies to both MXFP4 and regular float tensors, as well as bias tensors.
Custom Token IDs
The tokenizer sets specific token IDs:
- BOS: 199998 (
<|startoftext|>) - EOS: 199999 (
<|endoftext|>) - Additional EOS tokens: 200002 (
<|return|>), 200012 (<|call|>) - Both
add_bos_tokenandadd_eos_tokenare set to false.
Implementation Notes
The conversion is implemented in convert/convert_gptoss.go via the gptossModel struct. The mxfp4 struct implements io.WriterTo for custom serialization of MXFP4 tensors. The dual-flavor support uses a conditional in Replacements() based on whether HuggingFace-style config keys are present.