Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Convert LoRA To GGUF

From Leeroopedia
Field Value
Implementation Name Convert LoRA To GGUF
Doc Type Wrapper Doc
Workflow LoRA_Adapter_Workflow
Step 2 of 5
Source File convert_lora_to_gguf.py

Overview

Description

The convert_lora_to_gguf.py script converts Hugging Face PEFT LoRA adapters into the GGUF format required by llama.cpp. It reads adapter_model.safetensors (or adapter_model.bin) and adapter_config.json from a LoRA directory, remaps tensor names from PyTorch conventions to GGML conventions, preserves the factored A/B matrix representation, and writes a GGUF file with appropriate metadata.

The script uses the LoraTorchTensor class to wrap paired A and B tensors as a single logical object, enabling the model-specific conversion pipeline (inherited from ModelBase via convert_hf_to_gguf.py) to apply architecture-specific tensor transformations while preserving the low-rank factored form.

Usage

python convert_lora_to_gguf.py <lora_path> [--base <model_dir>] [--base-model-id <hf_id>] [--outfile <path>] [--outtype {f32,f16,bf16,q8_0,auto}]

Code Reference

Field Value
Source Location convert_lora_to_gguf.py
Entry Point convert_lora_to_gguf.py:291 (if __name__ == '__main__')
LoraTorchTensor convert_lora_to_gguf.py:41-224
parse_args convert_lora_to_gguf.py:237-277
Import from convert_hf_to_gguf import LazyTorchTensor, ModelBase

Entry point:

if __name__ == '__main__':
    args = parse_args()
    # ... initialization, model loading, conversion, and write

parse_args signature:

def parse_args() -> argparse.Namespace:
    parser = argparse.ArgumentParser(
        description="Convert a Hugging Face PEFT LoRA adapter to a GGUF file")
    parser.add_argument("lora_path", type=Path,
        help="directory containing Hugging Face PEFT LoRA config and weights")
    parser.add_argument("--outfile", type=Path,
        help="path to write to; default: based on input")
    parser.add_argument("--outtype", type=str,
        choices=["f32", "f16", "bf16", "q8_0", "auto"], default="f32")
    parser.add_argument("--base", type=Path,
        help="directory containing base model config files")
    parser.add_argument("--base-model-id", type=str,
        help="HuggingFace model ID for base model config")
    parser.add_argument("--bigendian", action="store_true")
    parser.add_argument("--no-lazy", action="store_true")
    parser.add_argument("--verbose", action="store_true")
    parser.add_argument("--dry-run", action="store_true")
    return parser.parse_args()

LoraTorchTensor class (core abstraction):

class LoraTorchTensor:
    _lora_A: Tensor  # (n_rank, row_size)
    _lora_B: Tensor  # (col_size, n_rank)
    _rank: int

    def __init__(self, A: Tensor, B: Tensor):
        assert len(A.shape) == len(B.shape)
        assert A.shape[-2] == B.shape[-1]
        # ...

    def get_lora_A_B(self) -> tuple[Tensor, Tensor]:
        return (self._lora_A, self._lora_B)

LoraModel inner class (modifies tensor output):

def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
    dest = list(super().modify_tensors(data_torch, name, bid))
    for dest_name, dest_data in dest:
        assert isinstance(dest_data, LoraTorchTensor)
        lora_a, lora_b = dest_data.get_lora_A_B()
        if "token_embd.weight" in dest_name:
            lora_a = lora_a.T
        yield (dest_name + ".lora_a", lora_a)
        yield (dest_name + ".lora_b", lora_b)

I/O Contract

Direction Name Type Description
Input lora_path directory path Directory containing adapter_config.json and adapter_model.safetensors (or .bin)
Input --base directory path (optional) Base model directory for config (config.json); actual weights not required
Input --base-model-id string (optional) HuggingFace model ID to fetch base config from hub
Input --outtype string Output precision: f32, f16, bf16, q8_0, or auto
Output GGUF file binary file GGUF file containing LoRA adapter tensors (A and B separately) and metadata

GGUF metadata written:

  • general.type = "adapter"
  • adapter.type = "lora"
  • adapter.lora.alpha = float value from adapter_config.json

Usage Examples

Basic conversion with local base model:

python convert_lora_to_gguf.py ./my-lora-adapter --base ./Llama-3.2-1B-Instruct

Conversion with automatic base model resolution from HuggingFace:

python convert_lora_to_gguf.py ./my-lora-adapter --base-model-id meta-llama/Llama-3.2-1B-Instruct

Conversion with f16 output:

python convert_lora_to_gguf.py ./my-lora-adapter --outtype f16 --outfile my-lora-f16.gguf

Dry run to inspect output without writing:

python convert_lora_to_gguf.py ./my-lora-adapter --base ./base-model --dry-run --verbose

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment