Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Common Init For Perplexity

From Leeroopedia
Revision as of 12:38, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Common_Init_For_Perplexity.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Aspect Detail
Implementation Name Common Init For Perplexity
Doc Type Pattern Doc
Domain Model Perplexity Evaluation
Purpose Model loading via common_init_from_params(params) for perplexity evaluation
Related Workflow Model_Perplexity_Evaluation

Overview

Description

This pattern documents how the llama-perplexity tool loads a model for evaluation. The tool uses common_init_from_params(params), a common library utility that handles GGUF model loading, LoRA adapter application, and context creation in a single call. The perplexity tool configures specific parameters before calling this function, including context size adjustments for parallel sequence evaluation and stride-based perplexity computation.

Usage

The model loading pattern follows a specific sequence within the main() function of perplexity.cpp:

  1. Parse CLI arguments to populate common_params
  2. Adjust context and batch parameters based on evaluation mode
  3. Initialize the GGML backend and NUMA
  4. Call common_init_from_params(params)
  5. Extract model and context pointers from the returned object

Code Reference

Aspect Detail
Source Location tools/perplexity/perplexity.cpp:2027-2030
Signature auto llama_init = common_init_from_params(params)
Import #include "common.h", #include "llama.h"

Model loading pattern (perplexity.cpp:2023-2035):

llama_backend_init();
llama_numa_init(params.numa);

// load the model and apply lora adapter, if any
auto llama_init = common_init_from_params(params);

auto * model = llama_init->model();
auto * ctx   = llama_init->context();

if (model == NULL) {
    LOG_ERR("%s: unable to load model\n", __func__);
    return 1;
}

const int n_ctx_train = llama_model_n_ctx_train(model);

Context size configuration before loading (perplexity.cpp:1997-2021):

const bool ppl = !params.hellaswag && !params.winogrande
              && !params.multiple_choice && !params.kl_divergence;

if (ppl) {
    const int32_t n_seq = std::max(1, params.n_batch / n_ctx);
    const int32_t n_kv = n_seq * n_ctx;

    params.n_parallel = n_seq;
    params.n_ctx      = n_kv;

    params.n_batch = std::min(params.n_batch, n_kv);
} else {
    params.n_batch = std::min(params.n_batch, params.n_ctx);
    if (params.kl_divergence) {
        params.n_parallel = 1;
    } else {
        // ensure there's at least enough seq_ids for HellaSwag
        params.n_parallel = std::max(4, params.n_parallel);
    }
}

if (params.ppl_stride > 0) {
    params.n_ctx += params.ppl_stride/2;
}

I/O Contract

Direction Name Type Description
Input params common_params Fully configured parameters including model path, context size, batch size, GPU layers, etc.
Output llama_init common_init_result Wrapper object providing model() and context() accessors
Output model llama_model * Loaded model handle (extracted via llama_init->model())
Output ctx llama_context * Inference context (extracted via llama_init->context())

Key parameter values set before loading:

Parameter PPL Mode HellaSwag/Winogrande Mode KL Divergence Mode
n_ctx n_seq * n_ctx (default: 512) params.n_ctx (user-specified) params.n_ctx (user-specified)
n_parallel max(1, n_batch / n_ctx) max(4, n_parallel) 1
n_batch min(n_batch, n_kv) min(n_batch, n_ctx) min(n_batch, n_ctx)

Usage Examples

Example 1: Standard perplexity evaluation loading

# This command triggers the PPL loading path:
# n_ctx=512, n_batch=2048 => n_seq=4, n_kv=2048, n_parallel=4
./llama-perplexity -m model.gguf \
    -f wikitext-2-raw/wiki.test.raw \
    --ctx-size 512 \
    --batch-size 2048 \
    -ngl 35

Example 2: HellaSwag evaluation loading

# This command triggers the HellaSwag loading path:
# n_parallel = max(4, n_parallel), ensuring at least 4 sequences
./llama-perplexity -m model.gguf \
    -f hellaswag_val_full.txt \
    --hellaswag \
    --hellaswag-tasks 400 \
    --ctx-size 2048 \
    -ngl 35

Example 3: KL divergence evaluation loading

# KL divergence mode uses n_parallel=1
./llama-perplexity -m model-q4.gguf \
    -f wikitext-2-raw/wiki.test.raw \
    --kl-divergence \
    --kl-divergence-base logits-f16.bin \
    --ctx-size 512

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment