Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Convert Phi3

From Leeroopedia
Knowledge Sources
Domains Model Conversion, GGUF Format
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Microsoft Phi-3 architecture, handling long/short RoPE scaling factors as additional tensor weights and computing attention scaling factors from context length ratios.

Description

The phi3Model struct implements ModelConverter for Phi-3 models with support for three RoPE scaling types: none, su/longrope (using sqrt-log attention factor), and yarn (using 0.1*log+1 attention factor). The KV method emits metadata with computed attention factor based on the ratio of max position embeddings to original max position embeddings. The Tensors method injects two additional tensors (rope_factors_long.weight and rope_factors_short.weight) containing the RoPE scaling factors from the model configuration, using sync.Once to inject them exactly once before the first layer. The ropeFactor type implements io.WriterTo for binary serialization.

Usage

Invoked automatically when the model's architecture matches Phi3ForCausalLM.

Code Reference

Source Location

  • Repository: Ollama
  • File: convert/convert_phi3.go
  • Lines: 1-122

Signature

type phi3Model struct {
    ModelParameters
    NumHiddenLayers uint32  `json:"num_hidden_layers"`
    HiddenSize      uint32  `json:"hidden_size"`
    RopeTheta       float32 `json:"rope_theta"`
    RopeScaling     struct {
        Type        string     `json:"type"`
        LongFactor  ropeFactor `json:"long_factor"`
        ShortFactor ropeFactor `json:"short_factor"`
    } `json:"rope_scaling"`
    MaxPositionEmbeddings         uint32 `json:"max_position_embeddings"`
    OriginalMaxPositionEmbeddings uint32 `json:"original_max_position_embeddings"`
}

type ropeFactor []float32

func (p *phi3Model) KV(t *Tokenizer) KV
func (p *phi3Model) Tensors(ts []Tensor) []*ggml.Tensor
func (p *phi3Model) Replacements() []string

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name Type Required Description
t *Tokenizer Yes Tokenizer data for GGUF metadata
ts []Tensor Yes Source tensors (rope factor tensors are injected)

Outputs

Name Type Description
KV KV GGUF metadata with phi3.* keys including computed RoPE attention factor
[]*ggml.Tensor slice Converted tensors plus injected rope_factors_long/short weight tensors

Usage Examples

// Converter registered for Phi3ForCausalLM
// RoPE factors are injected as additional weight tensors:
// rope_factors_long.weight and rope_factors_short.weight
// Attention factor is computed based on scaling type:
// longrope: sqrt(1 + log(scale) / log(orig_max_pos))
// yarn: 0.1 * log(scale) + 1.0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment