Implementation:Ggml org Llama cpp LoRA Download
| Field | Value |
|---|---|
| Implementation Name | LoRA Download |
| Doc Type | External Tool Doc |
| Workflow | LoRA_Adapter_Workflow |
| Step | 1 of 5 |
| Tool | HuggingFace Hub CLI / huggingface_hub Python library |
Overview
Description
This implementation documents the process of downloading LoRA adapter weights from the HuggingFace Hub for use with llama.cpp. LoRA adapters are distributed as small files (typically tens of megabytes) containing the low-rank decomposition matrices (A and B) for each adapted layer. The download process retrieves two essential files: adapter_model.safetensors (or adapter_model.bin) containing the weight tensors, and adapter_config.json containing the adapter metadata.
The llama.cpp conversion script (convert_lora_to_gguf.py) can also automatically resolve base model configurations from HuggingFace when the base_model_name_or_path field is set in the adapter configuration.
Usage
Users download LoRA adapters before converting them to GGUF format. This is typically done via the HuggingFace CLI or by cloning a repository with git-lfs.
Code Reference
| Field | Value |
|---|---|
| Source Location | External tool (HuggingFace Hub) |
| Related Script | convert_lora_to_gguf.py:308-309 (expects adapter_config.json and adapter_model.safetensors)
|
| Import | from huggingface_hub import try_to_load_from_cache (used in convert_lora_to_gguf.py:281)
|
The conversion script references the downloaded files directly:
lora_config = dir_lora / "adapter_config.json"
input_model = dir_lora / "adapter_model.safetensors"
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | HuggingFace repo ID | string | Repository identifier in format user/model-lora
|
| Output | adapter_model.safetensors | binary file | Serialized LoRA weight tensors (A and B matrices) in safetensors format |
| Output | adapter_config.json | JSON file | Adapter metadata including rank, alpha, base model path, and target modules |
| Output | adapter_model.bin (alternative) | binary file | PyTorch serialized LoRA weights (legacy format, also supported) |
adapter_config.json structure:
{
"r": 16,
"lora_alpha": 32,
"base_model_name_or_path": "meta-llama/Llama-3.2-1B-Instruct",
"target_modules": ["q_proj", "v_proj", "k_proj", "o_proj"],
"bias": "none",
"task_type": "CAUSAL_LM"
}
Usage Examples
Using HuggingFace CLI:
# Install huggingface-hub CLI
pip install huggingface-hub
# Download a LoRA adapter repository
huggingface-cli download user/my-lora-adapter --local-dir ./my-lora-adapter
# Verify expected files exist
ls ./my-lora-adapter/
# adapter_config.json adapter_model.safetensors
Using git with LFS:
# Clone the adapter repository with git-lfs
git lfs install
git clone https://huggingface.co/user/my-lora-adapter
# The directory now contains the adapter files
ls ./my-lora-adapter/
# adapter_config.json adapter_model.safetensors
Using Python huggingface_hub:
from huggingface_hub import snapshot_download
# Download the adapter to a local cache directory
local_dir = snapshot_download(repo_id="user/my-lora-adapter")
print(f"Adapter downloaded to: {local_dir}")