Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Common Init For Perplexity

From Leeroopedia
Aspect Detail
Implementation Name Common Init For Perplexity
Doc Type Pattern Doc
Domain Model Perplexity Evaluation
Purpose Model loading via common_init_from_params(params) for perplexity evaluation
Related Workflow Model_Perplexity_Evaluation

Overview

Description

This pattern documents how the llama-perplexity tool loads a model for evaluation. The tool uses common_init_from_params(params), a common library utility that handles GGUF model loading, LoRA adapter application, and context creation in a single call. The perplexity tool configures specific parameters before calling this function, including context size adjustments for parallel sequence evaluation and stride-based perplexity computation.

Usage

The model loading pattern follows a specific sequence within the main() function of perplexity.cpp:

  1. Parse CLI arguments to populate common_params
  2. Adjust context and batch parameters based on evaluation mode
  3. Initialize the GGML backend and NUMA
  4. Call common_init_from_params(params)
  5. Extract model and context pointers from the returned object

Code Reference

Aspect Detail
Source Location tools/perplexity/perplexity.cpp:2027-2030
Signature auto llama_init = common_init_from_params(params)
Import #include "common.h", #include "llama.h"

Model loading pattern (perplexity.cpp:2023-2035):

llama_backend_init();
llama_numa_init(params.numa);

// load the model and apply lora adapter, if any
auto llama_init = common_init_from_params(params);

auto * model = llama_init->model();
auto * ctx   = llama_init->context();

if (model == NULL) {
    LOG_ERR("%s: unable to load model\n", __func__);
    return 1;
}

const int n_ctx_train = llama_model_n_ctx_train(model);

Context size configuration before loading (perplexity.cpp:1997-2021):

const bool ppl = !params.hellaswag && !params.winogrande
              && !params.multiple_choice && !params.kl_divergence;

if (ppl) {
    const int32_t n_seq = std::max(1, params.n_batch / n_ctx);
    const int32_t n_kv = n_seq * n_ctx;

    params.n_parallel = n_seq;
    params.n_ctx      = n_kv;

    params.n_batch = std::min(params.n_batch, n_kv);
} else {
    params.n_batch = std::min(params.n_batch, params.n_ctx);
    if (params.kl_divergence) {
        params.n_parallel = 1;
    } else {
        // ensure there's at least enough seq_ids for HellaSwag
        params.n_parallel = std::max(4, params.n_parallel);
    }
}

if (params.ppl_stride > 0) {
    params.n_ctx += params.ppl_stride/2;
}

I/O Contract

Direction Name Type Description
Input params common_params Fully configured parameters including model path, context size, batch size, GPU layers, etc.
Output llama_init common_init_result Wrapper object providing model() and context() accessors
Output model llama_model * Loaded model handle (extracted via llama_init->model())
Output ctx llama_context * Inference context (extracted via llama_init->context())

Key parameter values set before loading:

Parameter PPL Mode HellaSwag/Winogrande Mode KL Divergence Mode
n_ctx n_seq * n_ctx (default: 512) params.n_ctx (user-specified) params.n_ctx (user-specified)
n_parallel max(1, n_batch / n_ctx) max(4, n_parallel) 1
n_batch min(n_batch, n_kv) min(n_batch, n_ctx) min(n_batch, n_ctx)

Usage Examples

Example 1: Standard perplexity evaluation loading

# This command triggers the PPL loading path:
# n_ctx=512, n_batch=2048 => n_seq=4, n_kv=2048, n_parallel=4
./llama-perplexity -m model.gguf \
    -f wikitext-2-raw/wiki.test.raw \
    --ctx-size 512 \
    --batch-size 2048 \
    -ngl 35

Example 2: HellaSwag evaluation loading

# This command triggers the HellaSwag loading path:
# n_parallel = max(4, n_parallel), ensuring at least 4 sequences
./llama-perplexity -m model.gguf \
    -f hellaswag_val_full.txt \
    --hellaswag \
    --hellaswag-tasks 400 \
    --ctx-size 2048 \
    -ngl 35

Example 3: KL divergence evaluation loading

# KL divergence mode uses n_parallel=1
./llama-perplexity -m model-q4.gguf \
    -f wikitext-2-raw/wiki.test.raw \
    --kl-divergence \
    --kl-divergence-base logits-f16.bin \
    --ctx-size 512

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment