Implementation:Ollama Ollama Imagegen GPT OSS
| Knowledge Sources | |
|---|---|
| Domains | Image Generation, LLM Inference |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GPT-OSS model architecture for MLX inference with custom SwiGLU activation, YaRN RoPE scaling, and optional Mixture of Experts.
Description
The gpt_oss.go file implements the GPT-OSS transformer with a custom SwiGLU activation that uses clipping (gate to [0, limit], up to [-limit, limit]) and a fixed alpha=1.702 sigmoid scaling. The SwiGLU function is compiled once as a singleton CompiledFunc for shapeless reuse across layers. The model supports YaRN RoPE frequency scaling with yarn_find_correction_dim/range for extended context, attention sinks for sliding window models, and optional MoE layers (specified via layer_types config). The Config supports sliding_window, num_local_experts, and per-layer type specification for hybrid dense/MoE architectures.
Usage
Used for text generation with GPT-OSS models in the MLX engine, supporting YaRN extended context and hybrid MoE architectures.
Code Reference
Source Location
- Repository: Ollama
- File: x/imagegen/models/gpt_oss/gpt_oss.go
- Lines: 1-487
Signature
type Config struct {
HiddenSize int32 `json:"hidden_size"`
NumHiddenLayers int32 `json:"num_hidden_layers"`
NumLocalExperts int32 `json:"num_local_experts"`
NumExpertsPerTok int32 `json:"num_experts_per_tok"`
LayerTypes []string `json:"layer_types"`
SwiGLULimit float32 `json:"swiglu_limit"`
RopeScaling *RopeScaling `json:"rope_scaling"`
}
func swiGLU(gate, up *mlx.Array, alpha, limit float32) *mlx.Array
func ComputeYarnFreqs(dims int32, base, scalingFactor float32, origMaxPos int32, betaFast, betaSlow float32) (*mlx.Array, float32)
func getCompiledSwiGLU() *mlx.CompiledFunc
Import
import "github.com/ollama/ollama/x/imagegen/models/gpt_oss"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| modelPath | string | Yes | Directory with model weights and config |
| tokens | *mlx.Array | Yes | Input token IDs [B, L] |
| caches | []cache.Cache | Yes | KV caches per layer |
Outputs
| Name | Type | Description |
|---|---|---|
| *mlx.Array | *mlx.Array | Logits [B, L, vocab_size] |
Usage Examples
model, err := gpt_oss.Load("/path/to/gpt-oss-model")
if err != nil {
return err
}
caches := model.NewCache(0)
logits := model.Forward(tokens, caches)