Implementation:Ollama Ollama Convert Phi3
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, GGUF Format |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GGUF model converter for the Microsoft Phi-3 architecture, handling long/short RoPE scaling factors as additional tensor weights and computing attention scaling factors from context length ratios.
Description
The phi3Model struct implements ModelConverter for Phi-3 models with support for three RoPE scaling types: none, su/longrope (using sqrt-log attention factor), and yarn (using 0.1*log+1 attention factor). The KV method emits metadata with computed attention factor based on the ratio of max position embeddings to original max position embeddings. The Tensors method injects two additional tensors (rope_factors_long.weight and rope_factors_short.weight) containing the RoPE scaling factors from the model configuration, using sync.Once to inject them exactly once before the first layer. The ropeFactor type implements io.WriterTo for binary serialization.
Usage
Invoked automatically when the model's architecture matches Phi3ForCausalLM.
Code Reference
Source Location
- Repository: Ollama
- File: convert/convert_phi3.go
- Lines: 1-122
Signature
type phi3Model struct {
ModelParameters
NumHiddenLayers uint32 `json:"num_hidden_layers"`
HiddenSize uint32 `json:"hidden_size"`
RopeTheta float32 `json:"rope_theta"`
RopeScaling struct {
Type string `json:"type"`
LongFactor ropeFactor `json:"long_factor"`
ShortFactor ropeFactor `json:"short_factor"`
} `json:"rope_scaling"`
MaxPositionEmbeddings uint32 `json:"max_position_embeddings"`
OriginalMaxPositionEmbeddings uint32 `json:"original_max_position_embeddings"`
}
type ropeFactor []float32
func (p *phi3Model) KV(t *Tokenizer) KV
func (p *phi3Model) Tensors(ts []Tensor) []*ggml.Tensor
func (p *phi3Model) Replacements() []string
Import
import "github.com/ollama/ollama/convert"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| t | *Tokenizer | Yes | Tokenizer data for GGUF metadata |
| ts | []Tensor | Yes | Source tensors (rope factor tensors are injected) |
Outputs
| Name | Type | Description |
|---|---|---|
| KV | KV | GGUF metadata with phi3.* keys including computed RoPE attention factor |
| []*ggml.Tensor | slice | Converted tensors plus injected rope_factors_long/short weight tensors |
Usage Examples
// Converter registered for Phi3ForCausalLM
// RoPE factors are injected as additional weight tensors:
// rope_factors_long.weight and rope_factors_short.weight
// Attention factor is computed based on scaling type:
// longrope: sqrt(1 + log(scale) / log(orig_max_pos))
// yarn: 0.1 * log(scale) + 1.0