Principle:Ollama Ollama GGUF Model Conversion Bert
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, BERT |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
BERT (Bidirectional Encoder Representations from Transformers) model conversion transforms HuggingFace SafeTensors into GGUF format, handling bidirectional encoder architecture, CLS/SEP token semantics, pooling configuration, and phantom-space tokenizer adjustments required by the GGML runtime.
Core Concepts
Tensor Name Mapping
The converter applies the following HuggingFace-to-GGUF tensor name replacements:
encoder.layer/encoder.layers->blkembeddings.word_embeddings->token_embdembeddings.token_type_embeddings->token_typesembeddings.LayerNorm->token_embd_normembeddings.position_embeddings->position_embdattention.self.query->attn_qattention.self.key->attn_kattention.self.value->attn_vattention.output.dense->attn_outputattention.output.LayerNorm->attn_output_normintermediate.dense->ffn_upoutput.dense->ffn_downoutput.LayerNorm->layer_output_norm
Architecture-Specific Hyperparameters
The GGUF metadata is written under the bert.* namespace:
bert.block_count-- number of transformer layers (fromnum_hidden_layers,n_layers, orn_layer)bert.context_length-- maximum position embeddingsbert.embedding_length-- hidden sizebert.feed_forward_length-- intermediate sizebert.attention.head_count-- number of attention headsbert.attention.layer_norm_epsilon-- LayerNorm epsilonbert.attention.causal-- always set tofalse(bidirectional)bert.pooling_type-- 0 (none), 1 (mean), or 2 (CLS)bert.normalize_embeddings-- whether to L2-normalize output embeddings
Special Handling
Pooling Configuration
The converter reads modules.json from the Sentence Transformers model directory to determine the pooling strategy. If sentence_transformers.models.Pooling is present, it reads pooling_mode_mean_tokens or pooling_mode_cls_token from the pooling config. The sentence_transformers.models.Normalize module enables embedding normalization.
Phantom Space Tokenization
BERT WordPiece tokens are converted to phantom-space format for GGML compatibility. Special tokens enclosed in brackets (e.g., [CLS], [SEP]) are kept as-is. Subword tokens prefixed with ## have the prefix stripped. All other tokens receive a Unicode lower-one-eighth-block prefix (U+2581).
Skipped Tensors
The following tensors are excluded from the GGUF output: embeddings.position_ids, pooler.dense.weight, and pooler.dense.bias.
Tokenizer Model
The tokenizer model type is set to bert with a token_type_count of 2, corresponding to segment A and segment B in the BERT input format.
Implementation Notes
The conversion is implemented in convert/convert_bert.go via the bertModel struct which satisfies both the ModelConverter and moreParser interfaces. The moreParser interface enables reading additional configuration files such as modules.json beyond the standard config.json.