Principle:Ollama Ollama GGUF Model Conversion NomicBert
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, Embeddings |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Nomic BERT conversion handles the Nomic AI BERT variant with extended context length support via rotary position embeddings (RoPE), optional Mixture-of-Experts, and QKV fused attention projections, transforming the embedding model from HuggingFace SafeTensors to GGUF format with pooling configuration and phantom-space tokenization.
Core Concepts
Tensor Name Mapping
The converter applies the following HuggingFace-to-GGUF tensor name replacements:
encoder.layer/encoder.layers->blkembeddings.word_embeddings->token_embdembeddings.token_type_embeddings->token_typesembeddings.LayerNorm->token_embd_normattention.self.qkv->attn_qkv(fused QKV)attention.output.dense->attn_outputattention.output.LayerNorm->attn_output_normmlp.up->ffn_upmlp.down->ffn_downmlp.router->ffn_gate_inp(MoE router)mlp.experts.up->ffn_up_exps(MoE expert up projections)mlp.experts.down->ffn_down_exps(MoE expert down projections)intermediate.dense->ffn_up(fallback)output.dense->ffn_down(fallback)output.LayerNorm->layer_output_norm
Architecture-Specific Hyperparameters
The GGUF metadata uses architecture-prefixed keys (either nomic-bert or nomic-bert-moe):
attention.causal-- set tofalse(bidirectional)pooling_type-- 0 (none), 1 (mean), or 2 (CLS)normalize_embeddings-- L2 normalization flagblock_count-- fromn_layersornum_hidden_layerscontext_length-- max position embeddings (extended via RoPE)embedding_length,feed_forward_lengthattention.head_count,head_count_kv(GQA support)attention.layer_norm_epsilon-- LayerNorm epsilonrope.freq_base-- RoPE theta
MoE parameters (when present):
expert_count-- number of local expertsexpert_used_count-- experts per tokenmoe_every_n_layers-- MoE layer frequency
Special Handling
Dynamic Architecture Selection
The GGUF architecture identifier is dynamically set based on whether MoE parameters are present. If moe_every_n_layers > 0, the architecture is nomic-bert-moe; otherwise it is nomic-bert.
RoPE-Based Extended Context
Unlike standard BERT which uses absolute position embeddings (limited to 512 tokens), Nomic BERT uses rotary position embeddings enabling context lengths of 2048 or 8192 tokens. The rope_theta frequency base is stored in GGUF metadata.
Fused QKV Attention
Nomic BERT uses a fused attention.self.qkv projection instead of separate Q, K, V projections, mapping to the attn_qkv GGUF tensor name.
Pooling Configuration
Same as standard BERT: reads modules.json for Sentence Transformers pooling mode and normalization settings.
Phantom Space Tokenization
Same WordPiece-to-phantom-space conversion as standard BERT: special tokens kept as-is, ## prefix stripped, other tokens get U+2581 prefix.
Skipped Tensors
Same as BERT: embeddings.position_ids, pooler.dense.weight, and pooler.dense.bias are excluded.
Implementation Notes
The conversion is implemented in convert/convert_nomicbert.go via the nomicbertModel struct which satisfies both ModelConverter and moreParser interfaces. The struct supports both v1 (dense FFN) and v2 (MoE FFN) Nomic BERT variants through conditional parameter handling.