Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Convert Llama GGML To GGUF

From Leeroopedia
Knowledge Sources
Domains Model_Conversion
Last Updated 2026-02-15 00:00 GMT

Overview

Converts legacy GGML format model files to the modern GGUF format used by llama.cpp.

Description

This script parses the binary GGML file structure (supporting GGML, GGMF, and GGJT format versions), reads hyperparameters, vocabulary, and tensor data, then writes them out using the gguf library in the GGUF format. It defines classes for GGMLFormat, GGMLFType (quantization types), Hyperparameters, Vocab, Tensor, GGMLModel, and GGMLToGGUF converter. The converter handles various quantization types including F32, F16, Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, and K-quants.

Usage

Run this script as a migration tool for users with models in the older GGML binary format, enabling them to convert to the current GGUF standard used by llama.cpp.

Code Reference

Source Location

Signature

class GGMLFormat(IntEnum):
    GGML = 0
    GGMF = 1
    GGJT = 2

class GGMLFType(IntEnum):
    ALL_F32              = 0
    MOSTLY_F16           = 1
    MOSTLY_Q4_0          = 2
    # ... additional quantization types ...

class Hyperparameters:
    def __init__(self): ...
    def load(self, data, offset): ...

class Vocab:
    def __init__(self, load_scores=True): ...
    def load(self, data, offset, ftype): ...

class GGMLModel:
    def load(self, data, offset, ftype): ...

class GGMLToGGUF:
    def save(self): ...

def handle_metadata(cfg, hp): ...
def handle_args(): ...
def main(): ...

Import

from __future__ import annotations
import argparse
import logging
import os
import struct
import sys
from enum import IntEnum
from pathlib import Path
import numpy as np
import gguf

I/O Contract

Inputs

Name Type Required Description
input_file Path Yes Path to the legacy GGML model file (.bin)
--eps float No RMS norm epsilon value override
--context-length int No Context length override
--gqa int No Grouped-query attention head count override
--name str No Model name to embed in GGUF metadata

Outputs

Name Type Description
output_file .gguf file Converted model in GGUF format with metadata, vocabulary, and tensor data

Usage Examples

# Convert a legacy GGML model to GGUF
# python convert_llama_ggml_to_gguf.py model.bin

# Convert with metadata overrides
# python convert_llama_ggml_to_gguf.py model.bin --name "LLaMA-7B" --context-length 4096 --eps 1e-5

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment