Implementation:Ggml org Llama cpp Semantic Check
| Knowledge Sources | |
|---|---|
| Domains | Model_Conversion, Verification |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Performs a detailed semantic similarity comparison between PyTorch and llama.cpp embedding outputs to validate conversion quality.
Description
Loads binary embedding files from both models, first verifies token consistency, then performs multi-level analysis: raw magnitude comparison per token, within-model token similarity matrices, cross-model same-token cosine similarities, and similarity matrix difference metrics (max, mean, RMS). For pooled embeddings, compares single sentence-level vectors. Provides a quality assessment from "EXCELLENT" (>0.95) to "POOR" (<0.70) and exits with a warning on failure, including transformers version mismatch diagnostics.
Usage
Use this as the primary semantic validation tool for embedding model conversions, providing detailed diagnostic output to identify and debug conversion issues.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: examples/model-conversion/scripts/utils/semantic_check.py
- Lines: 1-242
Signature
def cosine_similarity(a, b=None)
def load_embeddings_from_file(filename, n_tokens, n_embd)
def test_single_prompt_similarity(python_emb, cpp_emb, tokens, prompt)
def read_prompt_from_file(prompt_file)
def main()
Import
import numpy as np
import argparse
import os
import importlib
from pathlib import Path
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM, AutoModel
from common import compare_tokens, exit_with_warning
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| -m / --model-path | str | Yes | Path to the model directory |
| MODEL_PATH | env var | Yes | Environment variable pointing to the PyTorch model path |
| CONVERTED_MODEL | env var | Yes | Environment variable pointing to the converted llama.cpp model path |
| pytorch-{name}.bin | file | Yes | Binary file containing PyTorch embedding outputs |
| llamacpp-{name}.bin | file | Yes | Binary file containing llama.cpp embedding outputs |
| tokens | list | Yes (for test_single_prompt_similarity) | Token IDs for alignment verification |
| prompt | str | Yes (for test_single_prompt_similarity) | Original prompt text |
Outputs
| Name | Type | Description |
|---|---|---|
| stdout | text | Detailed comparison report: magnitude ratios, similarity matrices, cross-model cosine similarities, and quality assessment |
| exit code | int | 0 on success, 1 on failure (similarity below threshold) |
| test_single_prompt_similarity return | dict | Dictionary with cross_model_similarities, similarity_matrix_diff, max_diff, mean_diff, rms_diff |
Usage Examples
# Run semantic check from the model conversion scripts directory
export MODEL_PATH=/path/to/pytorch/model
export CONVERTED_MODEL=/path/to/converted/model.gguf
python semantic_check.py -m /path/to/model
# Programmatic usage
from semantic_check import cosine_similarity, load_embeddings_from_file
embeddings = load_embeddings_from_file("embeddings.bin", n_tokens=10, n_embd=768)
sim_matrix = cosine_similarity(embeddings)