Principle:Ollama Ollama GGUF Model Conversion Lfm2
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, LFM |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
LFM-2 (Liquid Foundation Model 2) conversion handles a novel hybrid architecture that alternates between short convolution layers and full attention layers, transforming the model from HuggingFace SafeTensors to GGUF format with per-layer KV head count arrays encoding the layer type information and convolution weight squeezing.
Core Concepts
Tensor Name Mapping
The converter applies the following HuggingFace-to-GGUF tensor name replacements:
model.embed_tokens->token_embdmodel.embedding_norm->output_normmodel.layers->blkoperator_norm->attn_normself_attn.q_proj->attn_qself_attn.k_proj->attn_kself_attn.v_proj->attn_vself_attn.out_proj->attn_outputself_attn.q_layernorm->attn_q_normself_attn.k_layernorm->attn_k_normconv.conv->shortconv.convconv.in_proj->shortconv.in_projconv.out_proj->shortconv.out_projfeed_forward.w1->ffn_gatefeed_forward.w2->ffn_downfeed_forward.w3->ffn_up
Architecture-Specific Hyperparameters
The GGUF metadata is written under the lfm2.* namespace:
lfm2.vocab_size-- vocabulary sizelfm2.block_count-- number of hidden layerslfm2.embedding_length-- hidden sizelfm2.feed_forward_length-- intermediate sizelfm2.context_length-- maximum position embeddingslfm2.attention.head_count-- number of attention headslfm2.attention.head_count_kv-- per-layer array (0 for conv layers, num_kv_heads for attention layers)lfm2.attention.key_length/value_length-- derived fromhidden_size / num_attention_headslfm2.attention.layer_norm_rms_epsilon-- normalization epsilonlfm2.rope.freq_base-- RoPE thetalfm2.shortconv.l_cache-- convolution cache length
Special Handling
Per-Layer KV Head Count Array
The layer_types string array from the config (containing "full_attention" or other types) is converted into a per-layer uint32 array for attention.head_count_kv. Attention layers get the actual num_key_value_heads value while short convolution layers get 0, allowing the runtime to dispatch the correct operator per layer.
Convolution Weight Squeezing
Short convolution weights with shape [D, 1, K] (3D with a singleton middle dimension) are squeezed to [D, K] (2D) for GGUF compatibility.
Unique Normalization Naming
LFM-2 uses embedding_norm for the output normalization (instead of the typical model.norm) and operator_norm for the pre-attention normalization.
Implementation Notes
The conversion is implemented in convert/convert_lfm2.go via the lfm2Model struct. The architecture uses SwiGLU-style feed-forward networks with w1/w2/w3 naming (gate/down/up). The hybrid layer design allows the model to use cheap convolution operations for most layers while reserving expensive attention for periodic global context aggregation.