Implementation:Ggml org Llama cpp Llama Adapter LoRA Init

Field	Value
Implementation Name	Llama Adapter LoRA Init
Doc Type	API Doc
Workflow	LoRA_Adapter_Workflow
Step	3 of 5 (CORE)
Source Files	`include/llama.h`, `src/llama-adapter.cpp`

Overview

Description

This implementation documents the llama.cpp C API for loading and applying LoRA adapters at runtime. The primary functions are llama_adapter_lora_init for loading a LoRA adapter from a GGUF file and associating it with a model, and llama_set_adapters_lora for activating one or more loaded adapters on an inference context with specified scaling factors.

All adapters must be loaded before context creation. Loaded adapters remain valid as long as the associated model is not freed, and they are automatically freed when the model is deleted.

Usage

// Load a LoRA adapter
struct llama_adapter_lora * adapter = llama_adapter_lora_init(model, "path/to/adapter.gguf");

// Apply adapter(s) to context
float scale = 1.0f;
llama_set_adapters_lora(ctx, &adapter, 1, &scale);

Code Reference

Field	Value
Source Location (header)	`include/llama.h:626-628` (init), `include/llama.h:660-664` (set_adapters)
Source Location (impl)	`src/llama-adapter.cpp:420-433`
Import	`#include "llama.h"`

llama_adapter_lora_init signature:

// Load a LoRA adapter from file
// The adapter is valid as long as the associated model is not freed
// All adapters must be loaded before context creation
LLAMA_API struct llama_adapter_lora * llama_adapter_lora_init(
        struct llama_model * model,
        const char * path_lora);

llama_set_adapters_lora signature:

// Set LoRa adapters on the context. Will only modify if the adapters
// currently in context are different.
LLAMA_API int32_t llama_set_adapters_lora(
        struct llama_context * ctx,
        struct llama_adapter_lora ** adapters,
        size_t n_adapters,
        float * scales);

Init implementation (src/llama-adapter.cpp:420-433):

llama_adapter_lora * llama_adapter_lora_init(llama_model * model, const char * path_lora) {
    llama_adapter_lora * adapter = new llama_adapter_lora();

    try {
        llama_adapter_lora_init_impl(*model, path_lora, *adapter);
        return adapter;
    } catch (const std::exception & err) {
        LLAMA_LOG_ERROR("%s: failed to apply lora adapter: %s\n", __func__, err.what());
        delete adapter;
    }

    return nullptr;
}

Related API functions:

// Manually free a LoRA adapter (deprecated: adapters freed with model)
LLAMA_API DEPRECATED(void llama_adapter_lora_free(struct llama_adapter_lora * adapter),
        "adapters are now freed together with the associated model");

// Get metadata from adapter
LLAMA_API int32_t llama_adapter_meta_val_str(const struct llama_adapter_lora * adapter,
        const char * key, char * buf, size_t buf_size);
LLAMA_API int32_t llama_adapter_meta_count(const struct llama_adapter_lora * adapter);

I/O Contract

Direction	Name	Type	Description
Input	model	`llama_model *`	Loaded base model to associate the adapter with
Input	path_lora	`const char *`	File path to a GGUF-format LoRA adapter
Output	(return)	`llama_adapter_lora *`	Opaque pointer to the loaded adapter, or nullptr on failure
llama_set_adapters_lora
Input	ctx	`llama_context *`	Inference context to apply adapters to
Input	adapters	`llama_adapter_lora **`	Array of adapter pointers
Input	n_adapters	`size_t`	Number of adapters in the array
Input	scales	`float *`	Array of scaling factors, one per adapter
Output	(return)	`int32_t`	0 on success

Usage Examples

Loading and applying a single adapter:

#include "llama.h"

// After model is loaded but before context creation
struct llama_adapter_lora * lora = llama_adapter_lora_init(model, "my-adapter.gguf");
if (lora == nullptr) {
    fprintf(stderr, "Failed to load LoRA adapter\n");
    return 1;
}

// Create context
struct llama_context * ctx = llama_init_from_model(model, ctx_params);

// Apply adapter at full strength
float scale = 1.0f;
llama_set_adapters_lora(ctx, &lora, 1, &scale);

Applying multiple adapters with different scales:

struct llama_adapter_lora * adapters[2];
adapters[0] = llama_adapter_lora_init(model, "adapter-instruct.gguf");
adapters[1] = llama_adapter_lora_init(model, "adapter-code.gguf");

float scales[2] = {1.0f, 0.5f};
llama_set_adapters_lora(ctx, adapters, 2, scales);

Querying adapter metadata:

char buf[256];
llama_adapter_meta_val_str(lora, "adapter.lora.alpha", buf, sizeof(buf));
printf("LoRA alpha: %s\n", buf);
printf("Metadata count: %d\n", llama_adapter_meta_count(lora));

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment