Implementation:Ggml org Llama cpp LoRA Download

Field	Value
Implementation Name	LoRA Download
Doc Type	External Tool Doc
Workflow	LoRA_Adapter_Workflow
Step	1 of 5
Tool	HuggingFace Hub CLI / huggingface_hub Python library

Overview

Description

This implementation documents the process of downloading LoRA adapter weights from the HuggingFace Hub for use with llama.cpp. LoRA adapters are distributed as small files (typically tens of megabytes) containing the low-rank decomposition matrices (A and B) for each adapted layer. The download process retrieves two essential files: adapter_model.safetensors (or adapter_model.bin) containing the weight tensors, and adapter_config.json containing the adapter metadata.

The llama.cpp conversion script (convert_lora_to_gguf.py) can also automatically resolve base model configurations from HuggingFace when the base_model_name_or_path field is set in the adapter configuration.

Usage

Users download LoRA adapters before converting them to GGUF format. This is typically done via the HuggingFace CLI or by cloning a repository with git-lfs.

Code Reference

Field	Value
Source Location	External tool (HuggingFace Hub)
Related Script	`convert_lora_to_gguf.py:308-309` (expects `adapter_config.json` and `adapter_model.safetensors`)
Import	`from huggingface_hub import try_to_load_from_cache` (used in convert_lora_to_gguf.py:281)

The conversion script references the downloaded files directly:

lora_config = dir_lora / "adapter_config.json"
input_model = dir_lora / "adapter_model.safetensors"

I/O Contract

Direction	Name	Type	Description
Input	HuggingFace repo ID	string	Repository identifier in format `user/model-lora`
Output	adapter_model.safetensors	binary file	Serialized LoRA weight tensors (A and B matrices) in safetensors format
Output	adapter_config.json	JSON file	Adapter metadata including rank, alpha, base model path, and target modules
Output	adapter_model.bin (alternative)	binary file	PyTorch serialized LoRA weights (legacy format, also supported)

adapter_config.json structure:

{
    "r": 16,
    "lora_alpha": 32,
    "base_model_name_or_path": "meta-llama/Llama-3.2-1B-Instruct",
    "target_modules": ["q_proj", "v_proj", "k_proj", "o_proj"],
    "bias": "none",
    "task_type": "CAUSAL_LM"
}

Usage Examples

Using HuggingFace CLI:

# Install huggingface-hub CLI
pip install huggingface-hub

# Download a LoRA adapter repository
huggingface-cli download user/my-lora-adapter --local-dir ./my-lora-adapter

# Verify expected files exist
ls ./my-lora-adapter/
# adapter_config.json  adapter_model.safetensors

Using git with LFS:

# Clone the adapter repository with git-lfs
git lfs install
git clone https://huggingface.co/user/my-lora-adapter

# The directory now contains the adapter files
ls ./my-lora-adapter/
# adapter_config.json  adapter_model.safetensors

Using Python huggingface_hub:

from huggingface_hub import snapshot_download

# Download the adapter to a local cache directory
local_dir = snapshot_download(repo_id="user/my-lora-adapter")
print(f"Adapter downloaded to: {local_dir}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment