Implementation:Ggml org Llama cpp Convert LoRA To GGUF
| Field | Value |
|---|---|
| Implementation Name | Convert LoRA To GGUF |
| Doc Type | Wrapper Doc |
| Workflow | LoRA_Adapter_Workflow |
| Step | 2 of 5 |
| Source File | convert_lora_to_gguf.py
|
Overview
Description
The convert_lora_to_gguf.py script converts Hugging Face PEFT LoRA adapters into the GGUF format required by llama.cpp. It reads adapter_model.safetensors (or adapter_model.bin) and adapter_config.json from a LoRA directory, remaps tensor names from PyTorch conventions to GGML conventions, preserves the factored A/B matrix representation, and writes a GGUF file with appropriate metadata.
The script uses the LoraTorchTensor class to wrap paired A and B tensors as a single logical object, enabling the model-specific conversion pipeline (inherited from ModelBase via convert_hf_to_gguf.py) to apply architecture-specific tensor transformations while preserving the low-rank factored form.
Usage
python convert_lora_to_gguf.py <lora_path> [--base <model_dir>] [--base-model-id <hf_id>] [--outfile <path>] [--outtype {f32,f16,bf16,q8_0,auto}]
Code Reference
| Field | Value |
|---|---|
| Source Location | convert_lora_to_gguf.py
|
| Entry Point | convert_lora_to_gguf.py:291 (if __name__ == '__main__')
|
| LoraTorchTensor | convert_lora_to_gguf.py:41-224
|
| parse_args | convert_lora_to_gguf.py:237-277
|
| Import | from convert_hf_to_gguf import LazyTorchTensor, ModelBase
|
Entry point:
if __name__ == '__main__':
args = parse_args()
# ... initialization, model loading, conversion, and write
parse_args signature:
def parse_args() -> argparse.Namespace:
parser = argparse.ArgumentParser(
description="Convert a Hugging Face PEFT LoRA adapter to a GGUF file")
parser.add_argument("lora_path", type=Path,
help="directory containing Hugging Face PEFT LoRA config and weights")
parser.add_argument("--outfile", type=Path,
help="path to write to; default: based on input")
parser.add_argument("--outtype", type=str,
choices=["f32", "f16", "bf16", "q8_0", "auto"], default="f32")
parser.add_argument("--base", type=Path,
help="directory containing base model config files")
parser.add_argument("--base-model-id", type=str,
help="HuggingFace model ID for base model config")
parser.add_argument("--bigendian", action="store_true")
parser.add_argument("--no-lazy", action="store_true")
parser.add_argument("--verbose", action="store_true")
parser.add_argument("--dry-run", action="store_true")
return parser.parse_args()
LoraTorchTensor class (core abstraction):
class LoraTorchTensor:
_lora_A: Tensor # (n_rank, row_size)
_lora_B: Tensor # (col_size, n_rank)
_rank: int
def __init__(self, A: Tensor, B: Tensor):
assert len(A.shape) == len(B.shape)
assert A.shape[-2] == B.shape[-1]
# ...
def get_lora_A_B(self) -> tuple[Tensor, Tensor]:
return (self._lora_A, self._lora_B)
LoraModel inner class (modifies tensor output):
def modify_tensors(self, data_torch: Tensor, name: str, bid: int | None) -> Iterable[tuple[str, Tensor]]:
dest = list(super().modify_tensors(data_torch, name, bid))
for dest_name, dest_data in dest:
assert isinstance(dest_data, LoraTorchTensor)
lora_a, lora_b = dest_data.get_lora_A_B()
if "token_embd.weight" in dest_name:
lora_a = lora_a.T
yield (dest_name + ".lora_a", lora_a)
yield (dest_name + ".lora_b", lora_b)
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | lora_path | directory path | Directory containing adapter_config.json and adapter_model.safetensors (or .bin)
|
| Input | --base | directory path (optional) | Base model directory for config (config.json); actual weights not required |
| Input | --base-model-id | string (optional) | HuggingFace model ID to fetch base config from hub |
| Input | --outtype | string | Output precision: f32, f16, bf16, q8_0, or auto |
| Output | GGUF file | binary file | GGUF file containing LoRA adapter tensors (A and B separately) and metadata |
GGUF metadata written:
general.type= "adapter"adapter.type= "lora"adapter.lora.alpha= float value from adapter_config.json
Usage Examples
Basic conversion with local base model:
python convert_lora_to_gguf.py ./my-lora-adapter --base ./Llama-3.2-1B-Instruct
Conversion with automatic base model resolution from HuggingFace:
python convert_lora_to_gguf.py ./my-lora-adapter --base-model-id meta-llama/Llama-3.2-1B-Instruct
Conversion with f16 output:
python convert_lora_to_gguf.py ./my-lora-adapter --outtype f16 --outfile my-lora-f16.gguf
Dry run to inspect output without writing:
python convert_lora_to_gguf.py ./my-lora-adapter --base ./base-model --dry-run --verbose