Implementation:Ggml org Llama cpp Llama Adapter LoRA Init
| Field | Value |
|---|---|
| Implementation Name | Llama Adapter LoRA Init |
| Doc Type | API Doc |
| Workflow | LoRA_Adapter_Workflow |
| Step | 3 of 5 (CORE) |
| Source Files | include/llama.h, src/llama-adapter.cpp
|
Overview
Description
This implementation documents the llama.cpp C API for loading and applying LoRA adapters at runtime. The primary functions are llama_adapter_lora_init for loading a LoRA adapter from a GGUF file and associating it with a model, and llama_set_adapters_lora for activating one or more loaded adapters on an inference context with specified scaling factors.
All adapters must be loaded before context creation. Loaded adapters remain valid as long as the associated model is not freed, and they are automatically freed when the model is deleted.
Usage
// Load a LoRA adapter
struct llama_adapter_lora * adapter = llama_adapter_lora_init(model, "path/to/adapter.gguf");
// Apply adapter(s) to context
float scale = 1.0f;
llama_set_adapters_lora(ctx, &adapter, 1, &scale);
Code Reference
| Field | Value |
|---|---|
| Source Location (header) | include/llama.h:626-628 (init), include/llama.h:660-664 (set_adapters)
|
| Source Location (impl) | src/llama-adapter.cpp:420-433
|
| Import | #include "llama.h"
|
llama_adapter_lora_init signature:
// Load a LoRA adapter from file
// The adapter is valid as long as the associated model is not freed
// All adapters must be loaded before context creation
LLAMA_API struct llama_adapter_lora * llama_adapter_lora_init(
struct llama_model * model,
const char * path_lora);
llama_set_adapters_lora signature:
// Set LoRa adapters on the context. Will only modify if the adapters
// currently in context are different.
LLAMA_API int32_t llama_set_adapters_lora(
struct llama_context * ctx,
struct llama_adapter_lora ** adapters,
size_t n_adapters,
float * scales);
Init implementation (src/llama-adapter.cpp:420-433):
llama_adapter_lora * llama_adapter_lora_init(llama_model * model, const char * path_lora) {
llama_adapter_lora * adapter = new llama_adapter_lora();
try {
llama_adapter_lora_init_impl(*model, path_lora, *adapter);
return adapter;
} catch (const std::exception & err) {
LLAMA_LOG_ERROR("%s: failed to apply lora adapter: %s\n", __func__, err.what());
delete adapter;
}
return nullptr;
}
Related API functions:
// Manually free a LoRA adapter (deprecated: adapters freed with model)
LLAMA_API DEPRECATED(void llama_adapter_lora_free(struct llama_adapter_lora * adapter),
"adapters are now freed together with the associated model");
// Get metadata from adapter
LLAMA_API int32_t llama_adapter_meta_val_str(const struct llama_adapter_lora * adapter,
const char * key, char * buf, size_t buf_size);
LLAMA_API int32_t llama_adapter_meta_count(const struct llama_adapter_lora * adapter);
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | model | llama_model * |
Loaded base model to associate the adapter with |
| Input | path_lora | const char * |
File path to a GGUF-format LoRA adapter |
| Output | (return) | llama_adapter_lora * |
Opaque pointer to the loaded adapter, or nullptr on failure |
| llama_set_adapters_lora | |||
| Input | ctx | llama_context * |
Inference context to apply adapters to |
| Input | adapters | llama_adapter_lora ** |
Array of adapter pointers |
| Input | n_adapters | size_t |
Number of adapters in the array |
| Input | scales | float * |
Array of scaling factors, one per adapter |
| Output | (return) | int32_t |
0 on success |
Usage Examples
Loading and applying a single adapter:
#include "llama.h"
// After model is loaded but before context creation
struct llama_adapter_lora * lora = llama_adapter_lora_init(model, "my-adapter.gguf");
if (lora == nullptr) {
fprintf(stderr, "Failed to load LoRA adapter\n");
return 1;
}
// Create context
struct llama_context * ctx = llama_init_from_model(model, ctx_params);
// Apply adapter at full strength
float scale = 1.0f;
llama_set_adapters_lora(ctx, &lora, 1, &scale);
Applying multiple adapters with different scales:
struct llama_adapter_lora * adapters[2];
adapters[0] = llama_adapter_lora_init(model, "adapter-instruct.gguf");
adapters[1] = llama_adapter_lora_init(model, "adapter-code.gguf");
float scales[2] = {1.0f, 0.5f};
llama_set_adapters_lora(ctx, adapters, 2, scales);
Querying adapter metadata:
char buf[256];
llama_adapter_meta_val_str(lora, "adapter.lora.alpha", buf, sizeof(buf));
printf("LoRA alpha: %s\n", buf);
printf("Metadata count: %d\n", llama_adapter_meta_count(lora));