Implementation:Ollama Ollama Imagegen GPT OSS

Knowledge Sources	Ollama
Domains	Image Generation, LLM Inference
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GPT-OSS model architecture for MLX inference with custom SwiGLU activation, YaRN RoPE scaling, and optional Mixture of Experts.

Description

The gpt_oss.go file implements the GPT-OSS transformer with a custom SwiGLU activation that uses clipping (gate to [0, limit], up to [-limit, limit]) and a fixed alpha=1.702 sigmoid scaling. The SwiGLU function is compiled once as a singleton CompiledFunc for shapeless reuse across layers. The model supports YaRN RoPE frequency scaling with yarn_find_correction_dim/range for extended context, attention sinks for sliding window models, and optional MoE layers (specified via layer_types config). The Config supports sliding_window, num_local_experts, and per-layer type specification for hybrid dense/MoE architectures.

Usage

Used for text generation with GPT-OSS models in the MLX engine, supporting YaRN extended context and hybrid MoE architectures.

Code Reference

Source Location

Repository: Ollama
File: x/imagegen/models/gpt_oss/gpt_oss.go
Lines: 1-487

Signature

type Config struct {
	HiddenSize       int32        `json:"hidden_size"`
	NumHiddenLayers  int32        `json:"num_hidden_layers"`
	NumLocalExperts  int32        `json:"num_local_experts"`
	NumExpertsPerTok int32        `json:"num_experts_per_tok"`
	LayerTypes       []string     `json:"layer_types"`
	SwiGLULimit      float32      `json:"swiglu_limit"`
	RopeScaling      *RopeScaling `json:"rope_scaling"`
}

func swiGLU(gate, up *mlx.Array, alpha, limit float32) *mlx.Array
func ComputeYarnFreqs(dims int32, base, scalingFactor float32, origMaxPos int32, betaFast, betaSlow float32) (*mlx.Array, float32)
func getCompiledSwiGLU() *mlx.CompiledFunc

Import

import "github.com/ollama/ollama/x/imagegen/models/gpt_oss"

I/O Contract

Inputs

Name	Type	Required	Description
modelPath	string	Yes	Directory with model weights and config
tokens	*mlx.Array	Yes	Input token IDs [B, L]
caches	[]cache.Cache	Yes	KV caches per layer

Outputs

Name	Type	Description
*mlx.Array	*mlx.Array	Logits [B, L, vocab_size]

Usage Examples

model, err := gpt_oss.Load("/path/to/gpt-oss-model")
if err != nil {
    return err
}

caches := model.NewCache(0)
logits := model.Forward(tokens, caches)

Related Pages

Principle:Ollama_Ollama_ImageGeneration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment