Implementation:Ggml org Llama cpp Convert Llama GGML To GGUF

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Model_Conversion
Last Updated	2026-02-15 00:00 GMT

Overview

Converts legacy GGML format model files to the modern GGUF format used by llama.cpp.

Description

This script parses the binary GGML file structure (supporting GGML, GGMF, and GGJT format versions), reads hyperparameters, vocabulary, and tensor data, then writes them out using the gguf library in the GGUF format. It defines classes for GGMLFormat, GGMLFType (quantization types), Hyperparameters, Vocab, Tensor, GGMLModel, and GGMLToGGUF converter. The converter handles various quantization types including F32, F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, and K-quants.

Usage

Run this script as a migration tool for users with models in the older GGML binary format, enabling them to convert to the current GGUF standard used by llama.cpp.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: convert_llama_ggml_to_gguf.py
Lines: 1-450

Signature

class GGMLFormat(IntEnum):
    GGML = 0
    GGMF = 1
    GGJT = 2

class GGMLFType(IntEnum):
    ALL_F32              = 0
    MOSTLY_F16           = 1
    MOSTLY_Q4_0          = 2
    # ... additional quantization types ...

class Hyperparameters:
    def __init__(self): ...
    def load(self, data, offset): ...

class Vocab:
    def __init__(self, load_scores=True): ...
    def load(self, data, offset, ftype): ...

class GGMLModel:
    def load(self, data, offset, ftype): ...

class GGMLToGGUF:
    def save(self): ...

def handle_metadata(cfg, hp): ...
def handle_args(): ...
def main(): ...

Import

from __future__ import annotations
import argparse
import logging
import os
import struct
import sys
from enum import IntEnum
from pathlib import Path
import numpy as np
import gguf

I/O Contract

Inputs

Name	Type	Required	Description
input_file	Path	Yes	Path to the legacy GGML model file (.bin)
--eps	float	No	RMS norm epsilon value override
--context-length	int	No	Context length override
--gqa	int	No	Grouped-query attention head count override
--name	str	No	Model name to embed in GGUF metadata

Outputs

Name	Type	Description
output_file	.gguf file	Converted model in GGUF format with metadata, vocabulary, and tensor data

Usage Examples

# Convert a legacy GGML model to GGUF
# python convert_llama_ggml_to_gguf.py model.bin

# Convert with metadata overrides
# python convert_llama_ggml_to_gguf.py model.bin --name "LLaMA-7B" --context-length 4096 --eps 1e-5

Related Pages

Principle:Ggml_org_Llama_cpp_ModelConversion

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment