Implementation:Ggml org Llama cpp Common Init For Perplexity
| Aspect | Detail |
|---|---|
| Implementation Name | Common Init For Perplexity |
| Doc Type | Pattern Doc |
| Domain | Model Perplexity Evaluation |
| Purpose | Model loading via common_init_from_params(params) for perplexity evaluation
|
| Related Workflow | Model_Perplexity_Evaluation |
Overview
Description
This pattern documents how the llama-perplexity tool loads a model for evaluation. The tool uses common_init_from_params(params), a common library utility that handles GGUF model loading, LoRA adapter application, and context creation in a single call. The perplexity tool configures specific parameters before calling this function, including context size adjustments for parallel sequence evaluation and stride-based perplexity computation.
Usage
The model loading pattern follows a specific sequence within the main() function of perplexity.cpp:
- Parse CLI arguments to populate
common_params - Adjust context and batch parameters based on evaluation mode
- Initialize the GGML backend and NUMA
- Call
common_init_from_params(params) - Extract model and context pointers from the returned object
Code Reference
| Aspect | Detail |
|---|---|
| Source Location | tools/perplexity/perplexity.cpp:2027-2030
|
| Signature | auto llama_init = common_init_from_params(params)
|
| Import | #include "common.h", #include "llama.h"
|
Model loading pattern (perplexity.cpp:2023-2035):
llama_backend_init();
llama_numa_init(params.numa);
// load the model and apply lora adapter, if any
auto llama_init = common_init_from_params(params);
auto * model = llama_init->model();
auto * ctx = llama_init->context();
if (model == NULL) {
LOG_ERR("%s: unable to load model\n", __func__);
return 1;
}
const int n_ctx_train = llama_model_n_ctx_train(model);
Context size configuration before loading (perplexity.cpp:1997-2021):
const bool ppl = !params.hellaswag && !params.winogrande
&& !params.multiple_choice && !params.kl_divergence;
if (ppl) {
const int32_t n_seq = std::max(1, params.n_batch / n_ctx);
const int32_t n_kv = n_seq * n_ctx;
params.n_parallel = n_seq;
params.n_ctx = n_kv;
params.n_batch = std::min(params.n_batch, n_kv);
} else {
params.n_batch = std::min(params.n_batch, params.n_ctx);
if (params.kl_divergence) {
params.n_parallel = 1;
} else {
// ensure there's at least enough seq_ids for HellaSwag
params.n_parallel = std::max(4, params.n_parallel);
}
}
if (params.ppl_stride > 0) {
params.n_ctx += params.ppl_stride/2;
}
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | params | common_params |
Fully configured parameters including model path, context size, batch size, GPU layers, etc. |
| Output | llama_init | common_init_result |
Wrapper object providing model() and context() accessors
|
| Output | model | llama_model * |
Loaded model handle (extracted via llama_init->model())
|
| Output | ctx | llama_context * |
Inference context (extracted via llama_init->context())
|
Key parameter values set before loading:
| Parameter | PPL Mode | HellaSwag/Winogrande Mode | KL Divergence Mode |
|---|---|---|---|
n_ctx |
n_seq * n_ctx (default: 512) |
params.n_ctx (user-specified) |
params.n_ctx (user-specified)
|
n_parallel |
max(1, n_batch / n_ctx) |
max(4, n_parallel) |
1 |
n_batch |
min(n_batch, n_kv) |
min(n_batch, n_ctx) |
min(n_batch, n_ctx)
|
Usage Examples
Example 1: Standard perplexity evaluation loading
# This command triggers the PPL loading path:
# n_ctx=512, n_batch=2048 => n_seq=4, n_kv=2048, n_parallel=4
./llama-perplexity -m model.gguf \
-f wikitext-2-raw/wiki.test.raw \
--ctx-size 512 \
--batch-size 2048 \
-ngl 35
Example 2: HellaSwag evaluation loading
# This command triggers the HellaSwag loading path:
# n_parallel = max(4, n_parallel), ensuring at least 4 sequences
./llama-perplexity -m model.gguf \
-f hellaswag_val_full.txt \
--hellaswag \
--hellaswag-tasks 400 \
--ctx-size 2048 \
-ngl 35
Example 3: KL divergence evaluation loading
# KL divergence mode uses n_parallel=1
./llama-perplexity -m model-q4.gguf \
-f wikitext-2-raw/wiki.test.raw \
--kl-divergence \
--kl-divergence-base logits-f16.bin \
--ctx-size 512