Implementation:Ollama Ollama Convert Lfm2
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, GGUF Format |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GGUF model converter for the LFM2 hybrid architecture, handling per-layer KV head count arrays based on mixed attention and short convolution layer types, and squeezing convolution weight dimensions.
Description
The lfm2Model struct implements ModelConverter for LFM2 models that combine full attention layers with short convolution layers. The KV method builds a per-layer KV head count array where attention layers have the configured number of KV heads and short convolution layers have zero heads. It also emits the short convolution L-cache size. The Tensors method squeezes 3D convolution weight tensors from shape [D, 1, K] to [D, K] by removing the singleton dimension. Tensor name replacements map the conv.conv, conv.in_proj, and conv.out_proj paths to shortconv.* GGUF names.
Usage
Invoked automatically when the model's architecture matches LFM2ForCausalLM.
Code Reference
Source Location
- Repository: Ollama
- File: convert/convert_lfm2.go
- Lines: 1-100
Signature
type lfm2Model struct {
ModelParameters
HiddenSize uint32 `json:"hidden_size"`
NumHiddenLayers uint32 `json:"num_hidden_layers"`
ConvLCache uint32 `json:"conv_L_cache"`
LayerTypes []string `json:"layer_types"`
TieEmbedding bool `json:"tie_embedding"`
}
func (p *lfm2Model) KV(t *Tokenizer) KV
func (p *lfm2Model) Tensors(ts []Tensor) []*ggml.Tensor
func (p *lfm2Model) Replacements() []string
Import
import "github.com/ollama/ollama/convert"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| t | *Tokenizer | Yes | Tokenizer data for GGUF metadata |
| ts | []Tensor | Yes | Source tensors including conv weights to squeeze |
Outputs
| Name | Type | Description |
|---|---|---|
| KV | KV | GGUF metadata with lfm2.* keys including per-layer KV head counts and shortconv params |
| []*ggml.Tensor | slice | Converted tensors with squeezed convolution weights |
Usage Examples
// Converter registered for LFM2 architecture
// Per-layer KV head counts: attention layers get NumKeyValueHeads, conv layers get 0
// Conv weights [D, 1, K] are squeezed to [D, K]