Implementation:Ggml org Llama cpp Common Init For Perplexity

Aspect	Detail
Implementation Name	Common Init For Perplexity
Doc Type	Pattern Doc
Domain	Model Perplexity Evaluation
Purpose	Model loading via `common_init_from_params(params)` for perplexity evaluation
Related Workflow	Model_Perplexity_Evaluation

Overview

Description

This pattern documents how the llama-perplexity tool loads a model for evaluation. The tool uses common_init_from_params(params), a common library utility that handles GGUF model loading, LoRA adapter application, and context creation in a single call. The perplexity tool configures specific parameters before calling this function, including context size adjustments for parallel sequence evaluation and stride-based perplexity computation.

Usage

The model loading pattern follows a specific sequence within the main() function of perplexity.cpp:

Parse CLI arguments to populate common_params
Adjust context and batch parameters based on evaluation mode
Initialize the GGML backend and NUMA
Call common_init_from_params(params)
Extract model and context pointers from the returned object

Code Reference

Aspect	Detail
Source Location	`tools/perplexity/perplexity.cpp:2027-2030`
Signature	`auto llama_init = common_init_from_params(params)`
Import	`#include "common.h"`, `#include "llama.h"`

Model loading pattern (perplexity.cpp:2023-2035):

llama_backend_init();
llama_numa_init(params.numa);

// load the model and apply lora adapter, if any
auto llama_init = common_init_from_params(params);

auto * model = llama_init->model();
auto * ctx   = llama_init->context();

if (model == NULL) {
    LOG_ERR("%s: unable to load model\n", __func__);
    return 1;
}

const int n_ctx_train = llama_model_n_ctx_train(model);

Context size configuration before loading (perplexity.cpp:1997-2021):

const bool ppl = !params.hellaswag && !params.winogrande
              && !params.multiple_choice && !params.kl_divergence;

if (ppl) {
    const int32_t n_seq = std::max(1, params.n_batch / n_ctx);
    const int32_t n_kv = n_seq * n_ctx;

    params.n_parallel = n_seq;
    params.n_ctx      = n_kv;

    params.n_batch = std::min(params.n_batch, n_kv);
} else {
    params.n_batch = std::min(params.n_batch, params.n_ctx);
    if (params.kl_divergence) {
        params.n_parallel = 1;
    } else {
        // ensure there's at least enough seq_ids for HellaSwag
        params.n_parallel = std::max(4, params.n_parallel);
    }
}

if (params.ppl_stride > 0) {
    params.n_ctx += params.ppl_stride/2;
}

I/O Contract

Direction	Name	Type	Description
Input	params	`common_params`	Fully configured parameters including model path, context size, batch size, GPU layers, etc.
Output	llama_init	`common_init_result`	Wrapper object providing `model()` and `context()` accessors
Output	model	`llama_model *`	Loaded model handle (extracted via `llama_init->model()`)
Output	ctx	`llama_context *`	Inference context (extracted via `llama_init->context()`)

Key parameter values set before loading:

Parameter	PPL Mode	HellaSwag/Winogrande Mode	KL Divergence Mode
`n_ctx`	`n_seq * n_ctx` (default: 512)	`params.n_ctx` (user-specified)	`params.n_ctx` (user-specified)
`n_parallel`	`max(1, n_batch / n_ctx)`	`max(4, n_parallel)`	1
`n_batch`	`min(n_batch, n_kv)`	`min(n_batch, n_ctx)`	`min(n_batch, n_ctx)`

Usage Examples

Example 1: Standard perplexity evaluation loading

# This command triggers the PPL loading path:
# n_ctx=512, n_batch=2048 => n_seq=4, n_kv=2048, n_parallel=4
./llama-perplexity -m model.gguf \
    -f wikitext-2-raw/wiki.test.raw \
    --ctx-size 512 \
    --batch-size 2048 \
    -ngl 35

Example 2: HellaSwag evaluation loading

# This command triggers the HellaSwag loading path:
# n_parallel = max(4, n_parallel), ensuring at least 4 sequences
./llama-perplexity -m model.gguf \
    -f hellaswag_val_full.txt \
    --hellaswag \
    --hellaswag-tasks 400 \
    --ctx-size 2048 \
    -ngl 35

Example 3: KL divergence evaluation loading

# KL divergence mode uses n_parallel=1
./llama-perplexity -m model-q4.gguf \
    -f wikitext-2-raw/wiki.test.raw \
    --kl-divergence \
    --kl-divergence-base logits-f16.bin \
    --ctx-size 512

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment