Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Llama Adapter LoRA Init

From Leeroopedia
Field Value
Implementation Name Llama Adapter LoRA Init
Doc Type API Doc
Workflow LoRA_Adapter_Workflow
Step 3 of 5 (CORE)
Source Files include/llama.h, src/llama-adapter.cpp

Overview

Description

This implementation documents the llama.cpp C API for loading and applying LoRA adapters at runtime. The primary functions are llama_adapter_lora_init for loading a LoRA adapter from a GGUF file and associating it with a model, and llama_set_adapters_lora for activating one or more loaded adapters on an inference context with specified scaling factors.

All adapters must be loaded before context creation. Loaded adapters remain valid as long as the associated model is not freed, and they are automatically freed when the model is deleted.

Usage

// Load a LoRA adapter
struct llama_adapter_lora * adapter = llama_adapter_lora_init(model, "path/to/adapter.gguf");

// Apply adapter(s) to context
float scale = 1.0f;
llama_set_adapters_lora(ctx, &adapter, 1, &scale);

Code Reference

Field Value
Source Location (header) include/llama.h:626-628 (init), include/llama.h:660-664 (set_adapters)
Source Location (impl) src/llama-adapter.cpp:420-433
Import #include "llama.h"

llama_adapter_lora_init signature:

// Load a LoRA adapter from file
// The adapter is valid as long as the associated model is not freed
// All adapters must be loaded before context creation
LLAMA_API struct llama_adapter_lora * llama_adapter_lora_init(
        struct llama_model * model,
        const char * path_lora);

llama_set_adapters_lora signature:

// Set LoRa adapters on the context. Will only modify if the adapters
// currently in context are different.
LLAMA_API int32_t llama_set_adapters_lora(
        struct llama_context * ctx,
        struct llama_adapter_lora ** adapters,
        size_t n_adapters,
        float * scales);

Init implementation (src/llama-adapter.cpp:420-433):

llama_adapter_lora * llama_adapter_lora_init(llama_model * model, const char * path_lora) {
    llama_adapter_lora * adapter = new llama_adapter_lora();

    try {
        llama_adapter_lora_init_impl(*model, path_lora, *adapter);
        return adapter;
    } catch (const std::exception & err) {
        LLAMA_LOG_ERROR("%s: failed to apply lora adapter: %s\n", __func__, err.what());
        delete adapter;
    }

    return nullptr;
}

Related API functions:

// Manually free a LoRA adapter (deprecated: adapters freed with model)
LLAMA_API DEPRECATED(void llama_adapter_lora_free(struct llama_adapter_lora * adapter),
        "adapters are now freed together with the associated model");

// Get metadata from adapter
LLAMA_API int32_t llama_adapter_meta_val_str(const struct llama_adapter_lora * adapter,
        const char * key, char * buf, size_t buf_size);
LLAMA_API int32_t llama_adapter_meta_count(const struct llama_adapter_lora * adapter);

I/O Contract

Direction Name Type Description
Input model llama_model * Loaded base model to associate the adapter with
Input path_lora const char * File path to a GGUF-format LoRA adapter
Output (return) llama_adapter_lora * Opaque pointer to the loaded adapter, or nullptr on failure
llama_set_adapters_lora
Input ctx llama_context * Inference context to apply adapters to
Input adapters llama_adapter_lora ** Array of adapter pointers
Input n_adapters size_t Number of adapters in the array
Input scales float * Array of scaling factors, one per adapter
Output (return) int32_t 0 on success

Usage Examples

Loading and applying a single adapter:

#include "llama.h"

// After model is loaded but before context creation
struct llama_adapter_lora * lora = llama_adapter_lora_init(model, "my-adapter.gguf");
if (lora == nullptr) {
    fprintf(stderr, "Failed to load LoRA adapter\n");
    return 1;
}

// Create context
struct llama_context * ctx = llama_init_from_model(model, ctx_params);

// Apply adapter at full strength
float scale = 1.0f;
llama_set_adapters_lora(ctx, &lora, 1, &scale);

Applying multiple adapters with different scales:

struct llama_adapter_lora * adapters[2];
adapters[0] = llama_adapter_lora_init(model, "adapter-instruct.gguf");
adapters[1] = llama_adapter_lora_init(model, "adapter-code.gguf");

float scales[2] = {1.0f, 0.5f};
llama_set_adapters_lora(ctx, adapters, 2, scales);

Querying adapter metadata:

char buf[256];
llama_adapter_meta_val_str(lora, "adapter.lora.alpha", buf, sizeof(buf));
printf("LoRA alpha: %s\n", buf);
printf("Metadata count: %d\n", llama_adapter_meta_count(lora));

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment