Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama MLXRunner Ops Extra

From Leeroopedia
Knowledge Sources
Domains MLX Runtime, Tensor Operations
Last Updated 2025-02-15 00:00 GMT

Overview

Extended MLX operations including quantization, convenience wrappers, attention primitives, and additional tensor operations not covered in the core ops.go file.

Description

Provides quantization operations (Quantize, Dequantize, QuantizedMatmul, GatherQMM) supporting multiple modes (affine, nvfp4, mxfp8) with configurable group size and bits. Includes function-style wrappers (Add, Sub, Mul, Div, Matmul, Reshape, Transpose), neural network primitives (SiLU, RoPEWithBase, ScaledDotProductAttentionCausal, RMSNormFn), scalar helpers, array constructors, and a reflection-based Collect utility for gathering all Array pointers from nested structures.

Usage

Used extensively throughout model implementations for quantized inference, attention computation, and tensor manipulation. Provides the higher-level API that model code calls.

Code Reference

Source Location

  • Repository: Ollama
  • File: x/mlxrunner/mlx/ops_extra.go
  • Lines: 1-450

Signature

func Quantize(w *Array, groupSize, bits int, mode string) (weights, scales, biases *Array)
func Dequantize(w, scales, biases *Array, groupSize, bits int, mode string) *Array
func QuantizedMatmul(x, w, scales, biases *Array, transpose bool, groupSize, bits int, mode string) *Array
func GatherQMM(x, w, scales *Array, biases, lhsIndices, rhsIndices *Array, transpose bool, groupSize, bits int, mode string, sortedIndices bool) *Array
func SiLU(a *Array) *Array
func RoPEWithBase(x *Array, dims int, traditional bool, base, scale float32, offset int) *Array
func ScaledDotProductAttentionCausal(q, k, v *Array, scale float32, causalMask bool) *Array
func RMSNormFn(x, weight *Array, eps float32) *Array
func Collect(v any) []*Array

Import

import "github.com/ollama/ollama/x/mlxrunner/mlx"

I/O Contract

Inputs

Name Type Required Description
w *Array Yes Weight tensor to quantize
groupSize int Yes Group size for quantization (e.g. 32, 64)
bits int Yes Bits per weight (4 or 8)
mode string Yes Quantization mode: "affine", "nvfp4", "mxfp8"

Outputs

Name Type Description
weights *Array Quantized weight data
scales *Array Scale factors for dequantization
biases *Array Quantization biases (nil for nvfp4)

Usage Examples

// Quantize a weight tensor
qw, scales, biases := mlx.Quantize(weight, 32, 4, "affine")

// Quantized matrix multiplication
out := mlx.QuantizedMatmul(input, qw, scales, biases, true, 32, 4, "affine")

// Attention with causal mask
attn := mlx.ScaledDotProductAttentionCausal(q, k, v, scale, true)

// Collect all arrays from a model struct for evaluation
arrays := mlx.Collect(model)
mlx.Eval(arrays...)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment