Implementation:Ggml org Llama cpp Adapter Header

Knowledge Sources	Ggml_org_Llama_cpp
Domains	LoRA, Control_Vector
Last Updated	2026-02-15 00:00 GMT

Overview

Declares data structures for control vector and LoRA adapter support in llama.cpp.

Description

This header defines three key structs: `llama_adapter_cvec` manages per-layer control vector tensors with methods to retrieve and apply them to hidden states; `llama_adapter_lora_weight` holds low-rank A/B tensor pairs and computes scaling based on rank and alpha; `llama_adapter_lora` maps tensor names to LoRA weight pairs and stores metadata including activated LoRA (aLoRA) invocation tokens. A type alias `llama_adapter_loras` maps adapter pointers to their scaling factors.

Usage

Include this header when implementing or modifying adapter functionality. It defines the core adapter abstraction layer that enables both control vector steering and LoRA fine-tuning to be applied during inference without modifying base model weights.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: src/llama-adapter.h
Lines: 1-86

Signature

struct llama_adapter_cvec {
    ggml_tensor * tensor_for(int il) const;
    ggml_tensor * apply_to(ggml_context * ctx, ggml_tensor * cur, int il) const;
    bool apply(const llama_model & model, const float * data, size_t len,
               int32_t n_embd, int32_t il_start, int32_t il_end);
};

struct llama_adapter_lora_weight {
    ggml_tensor * a = nullptr;
    ggml_tensor * b = nullptr;
    float get_scale(float alpha, float adapter_scale) const;
};

struct llama_adapter_lora {
    std::unordered_map<std::string, llama_adapter_lora_weight> ab_map;
    float alpha;
    std::vector<llama_token> alora_invocation_tokens;
    llama_adapter_lora_weight * get_weight(ggml_tensor * w);
    uint32_t get_n_nodes() const;
};

using llama_adapter_loras = std::unordered_map<llama_adapter_lora *, float>;

Import

#include "llama-adapter.h"
// Dependencies:
#include "llama.h"
#include "ggml-cpp.h"
#include <string>
#include <unordered_map>
#include <vector>

I/O Contract

Inputs

Name	Type	Required	Description
il	int	Yes	Layer index for tensor_for and apply_to methods
data	const float *	Yes	Control vector data for llama_adapter_cvec::apply
n_embd	int32_t	Yes	Embedding dimension for control vector application
il_start / il_end	int32_t	Yes	Layer range for control vector application
alpha	float	Yes	LoRA scaling alpha parameter
adapter_scale	float	Yes	Additional adapter scaling factor

Outputs

Name	Type	Description
tensor_for return	ggml_tensor *	Per-layer control vector tensor, or nullptr if out of range
apply_to return	ggml_tensor *	Modified hidden state tensor with control vector applied
get_scale return	float	Computed LoRA scale: alpha * adapter_scale / rank
get_weight return	llama_adapter_lora_weight *	Pointer to the LoRA weight pair for a given base tensor

Usage Examples

#include "llama-adapter.h"

// Apply a control vector to modify model behavior
llama_adapter_cvec cvec;
cvec.apply(model, data, len, n_embd, il_start, il_end);
ggml_tensor * modified = cvec.apply_to(ctx, hidden_state, layer_index);

// Access LoRA weights for a tensor
llama_adapter_lora lora;
auto * weight = lora.get_weight(base_tensor);
float scale = weight->get_scale(lora.alpha, adapter_scale);

Related Pages

Principle:Ggml_org_Llama_cpp_AdapterSupport

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment