Principle:Ollama Ollama GGUF Model Conversion Bert

Knowledge Sources	Ollama
Domains	Model Conversion, BERT
Last Updated	2025-02-15 00:00 GMT

Overview

BERT (Bidirectional Encoder Representations from Transformers) model conversion transforms HuggingFace SafeTensors into GGUF format, handling bidirectional encoder architecture, CLS/SEP token semantics, pooling configuration, and phantom-space tokenizer adjustments required by the GGML runtime.

Core Concepts

Tensor Name Mapping

The converter applies the following HuggingFace-to-GGUF tensor name replacements:

encoder.layer / encoder.layers -> blk
embeddings.word_embeddings -> token_embd
embeddings.token_type_embeddings -> token_types
embeddings.LayerNorm -> token_embd_norm
embeddings.position_embeddings -> position_embd
attention.self.query -> attn_q
attention.self.key -> attn_k
attention.self.value -> attn_v
attention.output.dense -> attn_output
attention.output.LayerNorm -> attn_output_norm
intermediate.dense -> ffn_up
output.dense -> ffn_down
output.LayerNorm -> layer_output_norm

Architecture-Specific Hyperparameters

The GGUF metadata is written under the bert.* namespace:

bert.block_count -- number of transformer layers (from num_hidden_layers, n_layers, or n_layer)
bert.context_length -- maximum position embeddings
bert.embedding_length -- hidden size
bert.feed_forward_length -- intermediate size
bert.attention.head_count -- number of attention heads
bert.attention.layer_norm_epsilon -- LayerNorm epsilon
bert.attention.causal -- always set to false (bidirectional)
bert.pooling_type -- 0 (none), 1 (mean), or 2 (CLS)
bert.normalize_embeddings -- whether to L2-normalize output embeddings

Special Handling

Pooling Configuration

The converter reads modules.json from the Sentence Transformers model directory to determine the pooling strategy. If sentence_transformers.models.Pooling is present, it reads pooling_mode_mean_tokens or pooling_mode_cls_token from the pooling config. The sentence_transformers.models.Normalize module enables embedding normalization.

Phantom Space Tokenization

BERT WordPiece tokens are converted to phantom-space format for GGML compatibility. Special tokens enclosed in brackets (e.g., [CLS], [SEP]) are kept as-is. Subword tokens prefixed with ## have the prefix stripped. All other tokens receive a Unicode lower-one-eighth-block prefix (U+2581).

Skipped Tensors

The following tensors are excluded from the GGUF output: embeddings.position_ids, pooler.dense.weight, and pooler.dense.bias.

Tokenizer Model

The tokenizer model type is set to bert with a token_type_count of 2, corresponding to segment A and segment B in the BERT input format.

Implementation Notes

The conversion is implemented in convert/convert_bert.go via the bertModel struct which satisfies both the ModelConverter and moreParser interfaces. The moreParser interface enables reading additional configuration files such as modules.json beyond the standard config.json.

Related Pages

Implementation:Ollama_Ollama_Convert_Bert

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment