Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Common Init From Params Target

From Leeroopedia
Field Value
Implementation Name Common Init From Params Target
Doc Type Pattern Doc
Workflow Speculative_Decoding
Step 2 of 5
Source File examples/speculative-simple/speculative-simple.cpp

Overview

Description

This implementation documents the pattern of loading the target (large) model for speculative decoding using common_init_from_params(params). In the speculative-simple example, the target model is loaded first using the standard common initialization path, which handles model file loading, context creation, and parameter configuration. The resulting model and context objects become the verification engine for the speculative decoding pipeline.

The target model's vocabulary (obtained via llama_model_get_vocab) serves as the canonical vocabulary for the entire pipeline, including tokenization of the prompt and decoding of the output.

Usage

auto llama_init_tgt = common_init_from_params(params);
llama_model * model_tgt = llama_init_tgt->model();
llama_context * ctx_tgt = llama_init_tgt->context();

Code Reference

Field Value
Source Location examples/speculative-simple/speculative-simple.cpp:41-44
Signature common_init_from_params(params) returns an init object with model() and context() accessors
Import #include "common.h", #include "llama.h"

Target model loading pattern:

// load the target model
auto llama_init_tgt = common_init_from_params(params);

model_tgt = llama_init_tgt->model();
ctx_tgt   = llama_init_tgt->context();

Context setup (preceding the load):

int main(int argc, char ** argv) {
    common_params params;

    if (!common_params_parse(argc, argv, params, LLAMA_EXAMPLE_SPECULATIVE)) {
        return 1;
    }

    // ...

    common_init();

    // init llama.cpp
    llama_backend_init();
    llama_numa_init(params.numa);

    llama_model * model_tgt = NULL;
    llama_context * ctx_tgt = NULL;

    // load the target model
    auto llama_init_tgt = common_init_from_params(params);

    model_tgt = llama_init_tgt->model();
    ctx_tgt   = llama_init_tgt->context();

    const llama_vocab * vocab = llama_model_get_vocab(model_tgt);

I/O Contract

Direction Name Type Description
Input params common_params Parsed command-line parameters including model path, context size, GPU layers, etc.
Output model_tgt llama_model * Loaded target model
Output ctx_tgt llama_context * Target model inference context configured for batch verification
Output vocab const llama_vocab * Vocabulary from the target model, used for tokenization/detokenization

Prerequisites:

  • llama_backend_init() must be called before model loading
  • llama_numa_init(params.numa) should be called for NUMA-aware memory allocation
  • The target model path is specified via params.model.path (--model CLI flag)

Usage Examples

Complete target model setup for speculative decoding:

#include "common.h"
#include "llama.h"

common_params params;
common_params_parse(argc, argv, params, LLAMA_EXAMPLE_SPECULATIVE);

common_init();
llama_backend_init();
llama_numa_init(params.numa);

// Load target model
auto llama_init_tgt = common_init_from_params(params);
llama_model * model_tgt = llama_init_tgt->model();
llama_context * ctx_tgt = llama_init_tgt->context();

// Get vocabulary for tokenization
const llama_vocab * vocab = llama_model_get_vocab(model_tgt);

// Tokenize prompt using target model's vocabulary
std::vector<llama_token> inp = common_tokenize(ctx_tgt, params.prompt, true, true);

// Verify context is large enough
if (llama_n_ctx(ctx_tgt) < (uint32_t) inp.size()) {
    fprintf(stderr, "prompt exceeds context size\n");
    return 1;
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment