Implementation:Ggml org Llama cpp Convert LoRA To GGUF

Field	Value
Implementation Name	Convert LoRA To GGUF
Doc Type	Wrapper Doc
Workflow	LoRA_Adapter_Workflow
Step	2 of 5
Source File	`convert_lora_to_gguf.py`

Overview

Description

The convert_lora_to_gguf.py script converts Hugging Face PEFT LoRA adapters into the GGUF format required by llama.cpp. It reads adapter_model.safetensors (or adapter_model.bin) and adapter_config.json from a LoRA directory, remaps tensor names from PyTorch conventions to GGML conventions, preserves the factored A/B matrix representation, and writes a GGUF file with appropriate metadata.

The script uses the LoraTorchTensor class to wrap paired A and B tensors as a single logical object, enabling the model-specific conversion pipeline (inherited from ModelBase via convert_hf_to_gguf.py) to apply architecture-specific tensor transformations while preserving the low-rank factored form.

Usage

python convert_lora_to_gguf.py <lora_path> [--base <model_dir>] [--base-model-id <hf_id>] [--outfile <path>] [--outtype {f32,f16,bf16,q8_0,auto}]

Code Reference

Field	Value
Source Location	`convert_lora_to_gguf.py`
Entry Point	`convert_lora_to_gguf.py:291` (`if __name__ == '__main__'`)
LoraTorchTensor	`convert_lora_to_gguf.py:41-224`
parse_args	`convert_lora_to_gguf.py:237-277`
Import	`from convert_hf_to_gguf import LazyTorchTensor, ModelBase`

Entry point:

if __name__ == '__main__':
    args = parse_args()
    # ... initialization, model loading, conversion, and write

parse_args signature:

def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Convert a Hugging Face PEFT LoRA adapter to a GGUF file")
    parser.add_argument("lora_path", type=Path,
        help="directory containing Hugging Face PEFT LoRA config and weights")
    parser.add_argument("--outfile", type=Path,
        help="path to write to; default: based on input")
    parser.add_argument("--outtype", type=str,
        choices=["f32", "f16", "bf16", "q8_0", "auto"], default="f32")
    parser.add_argument("--base", type=Path,
        help="directory containing base model config files")
    parser.add_argument("--base-model-id", type=str,
        help="HuggingFace model ID for base model config")
    parser.add_argument("--bigendian", action="store_true")
    parser.add_argument("--no-lazy", action="store_true")
    parser.add_argument("--verbose", action="store_true")
    parser.add_argument("--dry-run", action="store_true")
    return parser.parse_args()

LoraTorchTensor class (core abstraction):

class LoraTorchTensor:
    _lora_A: Tensor  # (n_rank, row_size)
    _lora_B: Tensor  # (col_size, n_rank)
    _rank: int

    def __init__(self, A: Tensor, B: Tensor):
        assert len(A.shape) == len(B.shape)
        assert A.shape[-2] == B.shape[-1]
        # ...

    def get_lora_A_B(self) -> tuple[Tensor, Tensor]:
        return (self._lora_A, self._lora_B)

LoraModel inner class (modifies tensor output):

def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
    dest = list(super().modify_tensors(data_torch, name, bid))
    for dest_name, dest_data in dest:
        assert isinstance(dest_data, LoraTorchTensor)
        lora_a, lora_b = dest_data.get_lora_A_B()
        if "token_embd.weight" in dest_name:
            lora_a = lora_a.T
        yield (dest_name + ".lora_a", lora_a)
        yield (dest_name + ".lora_b", lora_b)

I/O Contract

Direction	Name	Type	Description
Input	lora_path	directory path	Directory containing `adapter_config.json` and `adapter_model.safetensors` (or `.bin`)
Input	--base	directory path (optional)	Base model directory for config (config.json); actual weights not required
Input	--base-model-id	string (optional)	HuggingFace model ID to fetch base config from hub
Input	--outtype	string	Output precision: f32, f16, bf16, q8_0, or auto
Output	GGUF file	binary file	GGUF file containing LoRA adapter tensors (A and B separately) and metadata

GGUF metadata written:

general.type = "adapter"
adapter.type = "lora"
adapter.lora.alpha = float value from adapter_config.json

Usage Examples

Basic conversion with local base model:

python convert_lora_to_gguf.py ./my-lora-adapter --base ./Llama-3.2-1B-Instruct

Conversion with automatic base model resolution from HuggingFace:

python convert_lora_to_gguf.py ./my-lora-adapter --base-model-id meta-llama/Llama-3.2-1B-Instruct

Conversion with f16 output:

python convert_lora_to_gguf.py ./my-lora-adapter --outtype f16 --outfile my-lora-f16.gguf

Dry run to inspect output without writing:

python convert_lora_to_gguf.py ./my-lora-adapter --base ./base-model --dry-run --verbose

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment