Implementation:Ggml org Llama cpp Convert Llama GGML To GGUF
| Knowledge Sources | |
|---|---|
| Domains | Model_Conversion |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Converts legacy GGML format model files to the modern GGUF format used by llama.cpp.
Description
This script parses the binary GGML file structure (supporting GGML, GGMF, and GGJT format versions), reads hyperparameters, vocabulary, and tensor data, then writes them out using the gguf library in the GGUF format. It defines classes for GGMLFormat, GGMLFType (quantization types), Hyperparameters, Vocab, Tensor, GGMLModel, and GGMLToGGUF converter. The converter handles various quantization types including F32, F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, and K-quants.
Usage
Run this script as a migration tool for users with models in the older GGML binary format, enabling them to convert to the current GGUF standard used by llama.cpp.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: convert_llama_ggml_to_gguf.py
- Lines: 1-450
Signature
class GGMLFormat(IntEnum):
GGML = 0
GGMF = 1
GGJT = 2
class GGMLFType(IntEnum):
ALL_F32 = 0
MOSTLY_F16 = 1
MOSTLY_Q4_0 = 2
# ... additional quantization types ...
class Hyperparameters:
def __init__(self): ...
def load(self, data, offset): ...
class Vocab:
def __init__(self, load_scores=True): ...
def load(self, data, offset, ftype): ...
class GGMLModel:
def load(self, data, offset, ftype): ...
class GGMLToGGUF:
def save(self): ...
def handle_metadata(cfg, hp): ...
def handle_args(): ...
def main(): ...
Import
from __future__ import annotations
import argparse
import logging
import os
import struct
import sys
from enum import IntEnum
from pathlib import Path
import numpy as np
import gguf
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input_file | Path | Yes | Path to the legacy GGML model file (.bin) |
| --eps | float | No | RMS norm epsilon value override |
| --context-length | int | No | Context length override |
| --gqa | int | No | Grouped-query attention head count override |
| --name | str | No | Model name to embed in GGUF metadata |
Outputs
| Name | Type | Description |
|---|---|---|
| output_file | .gguf file | Converted model in GGUF format with metadata, vocabulary, and tensor data |
Usage Examples
# Convert a legacy GGML model to GGUF
# python convert_llama_ggml_to_gguf.py model.bin
# Convert with metadata overrides
# python convert_llama_ggml_to_gguf.py model.bin --name "LLaMA-7B" --context-length 4096 --eps 1e-5