Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ollama Ollama GGUF Model Conversion Bert

From Leeroopedia
Knowledge Sources
Domains Model Conversion, BERT
Last Updated 2025-02-15 00:00 GMT

Overview

BERT (Bidirectional Encoder Representations from Transformers) model conversion transforms HuggingFace SafeTensors into GGUF format, handling bidirectional encoder architecture, CLS/SEP token semantics, pooling configuration, and phantom-space tokenizer adjustments required by the GGML runtime.

Core Concepts

Tensor Name Mapping

The converter applies the following HuggingFace-to-GGUF tensor name replacements:

  • encoder.layer / encoder.layers -> blk
  • embeddings.word_embeddings -> token_embd
  • embeddings.token_type_embeddings -> token_types
  • embeddings.LayerNorm -> token_embd_norm
  • embeddings.position_embeddings -> position_embd
  • attention.self.query -> attn_q
  • attention.self.key -> attn_k
  • attention.self.value -> attn_v
  • attention.output.dense -> attn_output
  • attention.output.LayerNorm -> attn_output_norm
  • intermediate.dense -> ffn_up
  • output.dense -> ffn_down
  • output.LayerNorm -> layer_output_norm

Architecture-Specific Hyperparameters

The GGUF metadata is written under the bert.* namespace:

  • bert.block_count -- number of transformer layers (from num_hidden_layers, n_layers, or n_layer)
  • bert.context_length -- maximum position embeddings
  • bert.embedding_length -- hidden size
  • bert.feed_forward_length -- intermediate size
  • bert.attention.head_count -- number of attention heads
  • bert.attention.layer_norm_epsilon -- LayerNorm epsilon
  • bert.attention.causal -- always set to false (bidirectional)
  • bert.pooling_type -- 0 (none), 1 (mean), or 2 (CLS)
  • bert.normalize_embeddings -- whether to L2-normalize output embeddings

Special Handling

Pooling Configuration

The converter reads modules.json from the Sentence Transformers model directory to determine the pooling strategy. If sentence_transformers.models.Pooling is present, it reads pooling_mode_mean_tokens or pooling_mode_cls_token from the pooling config. The sentence_transformers.models.Normalize module enables embedding normalization.

Phantom Space Tokenization

BERT WordPiece tokens are converted to phantom-space format for GGML compatibility. Special tokens enclosed in brackets (e.g., [CLS], [SEP]) are kept as-is. Subword tokens prefixed with ## have the prefix stripped. All other tokens receive a Unicode lower-one-eighth-block prefix (U+2581).

Skipped Tensors

The following tensors are excluded from the GGUF output: embeddings.position_ids, pooler.dense.weight, and pooler.dense.bias.

Tokenizer Model

The tokenizer model type is set to bert with a token_type_count of 2, corresponding to segment A and segment B in the BERT input format.

Implementation Notes

The conversion is implemented in convert/convert_bert.go via the bertModel struct which satisfies both the ModelConverter and moreParser interfaces. The moreParser interface enables reading additional configuration files such as modules.json beyond the standard config.json.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment