Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Adapter Header

From Leeroopedia
Revision as of 12:38, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Adapter_Header.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains LoRA, Control_Vector
Last Updated 2026-02-15 00:00 GMT

Overview

Declares data structures for control vector and LoRA adapter support in llama.cpp.

Description

This header defines three key structs: `llama_adapter_cvec` manages per-layer control vector tensors with methods to retrieve and apply them to hidden states; `llama_adapter_lora_weight` holds low-rank A/B tensor pairs and computes scaling based on rank and alpha; `llama_adapter_lora` maps tensor names to LoRA weight pairs and stores metadata including activated LoRA (aLoRA) invocation tokens. A type alias `llama_adapter_loras` maps adapter pointers to their scaling factors.

Usage

Include this header when implementing or modifying adapter functionality. It defines the core adapter abstraction layer that enables both control vector steering and LoRA fine-tuning to be applied during inference without modifying base model weights.

Code Reference

Source Location

Signature

struct llama_adapter_cvec {
    ggml_tensor * tensor_for(int il) const;
    ggml_tensor * apply_to(ggml_context * ctx, ggml_tensor * cur, int il) const;
    bool apply(const llama_model & model, const float * data, size_t len,
               int32_t n_embd, int32_t il_start, int32_t il_end);
};

struct llama_adapter_lora_weight {
    ggml_tensor * a = nullptr;
    ggml_tensor * b = nullptr;
    float get_scale(float alpha, float adapter_scale) const;
};

struct llama_adapter_lora {
    std::unordered_map<std::string, llama_adapter_lora_weight> ab_map;
    float alpha;
    std::vector<llama_token> alora_invocation_tokens;
    llama_adapter_lora_weight * get_weight(ggml_tensor * w);
    uint32_t get_n_nodes() const;
};

using llama_adapter_loras = std::unordered_map<llama_adapter_lora *, float>;

Import

#include "llama-adapter.h"
// Dependencies:
#include "llama.h"
#include "ggml-cpp.h"
#include <string>
#include <unordered_map>
#include <vector>

I/O Contract

Inputs

Name Type Required Description
il int Yes Layer index for tensor_for and apply_to methods
data const float * Yes Control vector data for llama_adapter_cvec::apply
n_embd int32_t Yes Embedding dimension for control vector application
il_start / il_end int32_t Yes Layer range for control vector application
alpha float Yes LoRA scaling alpha parameter
adapter_scale float Yes Additional adapter scaling factor

Outputs

Name Type Description
tensor_for return ggml_tensor * Per-layer control vector tensor, or nullptr if out of range
apply_to return ggml_tensor * Modified hidden state tensor with control vector applied
get_scale return float Computed LoRA scale: alpha * adapter_scale / rank
get_weight return llama_adapter_lora_weight * Pointer to the LoRA weight pair for a given base tensor

Usage Examples

#include "llama-adapter.h"

// Apply a control vector to modify model behavior
llama_adapter_cvec cvec;
cvec.apply(model, data, len, n_embd, il_start, il_end);
ggml_tensor * modified = cvec.apply_to(ctx, hidden_state, layer_index);

// Access LoRA weights for a tensor
llama_adapter_lora lora;
auto * weight = lora.get_weight(base_tensor);
float scale = weight->get_scale(lora.alpha, adapter_scale);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment