Implementation:Ggml org Llama cpp Common Init From Params Target

Field	Value
Implementation Name	Common Init From Params Target
Doc Type	Pattern Doc
Workflow	Speculative_Decoding
Step	2 of 5
Source File	`examples/speculative-simple/speculative-simple.cpp`

Overview

Description

This implementation documents the pattern of loading the target (large) model for speculative decoding using common_init_from_params(params). In the speculative-simple example, the target model is loaded first using the standard common initialization path, which handles model file loading, context creation, and parameter configuration. The resulting model and context objects become the verification engine for the speculative decoding pipeline.

The target model's vocabulary (obtained via llama_model_get_vocab) serves as the canonical vocabulary for the entire pipeline, including tokenization of the prompt and decoding of the output.

Usage

auto llama_init_tgt = common_init_from_params(params);
llama_model * model_tgt = llama_init_tgt->model();
llama_context * ctx_tgt = llama_init_tgt->context();

Code Reference

Field	Value
Source Location	`examples/speculative-simple/speculative-simple.cpp:41-44`
Signature	`common_init_from_params(params)` returns an init object with `model()` and `context()` accessors
Import	`#include "common.h"`, `#include "llama.h"`

Target model loading pattern:

// load the target model
auto llama_init_tgt = common_init_from_params(params);

model_tgt = llama_init_tgt->model();
ctx_tgt   = llama_init_tgt->context();

Context setup (preceding the load):

int main(int argc, char ** argv) {
    common_params params;

    if (!common_params_parse(argc, argv, params, LLAMA_EXAMPLE_SPECULATIVE)) {
        return 1;
    }

    // ...

    common_init();

    // init llama.cpp
    llama_backend_init();
    llama_numa_init(params.numa);

    llama_model * model_tgt = NULL;
    llama_context * ctx_tgt = NULL;

    // load the target model
    auto llama_init_tgt = common_init_from_params(params);

    model_tgt = llama_init_tgt->model();
    ctx_tgt   = llama_init_tgt->context();

    const llama_vocab * vocab = llama_model_get_vocab(model_tgt);

I/O Contract

Direction	Name	Type	Description
Input	params	`common_params`	Parsed command-line parameters including model path, context size, GPU layers, etc.
Output	model_tgt	`llama_model *`	Loaded target model
Output	ctx_tgt	`llama_context *`	Target model inference context configured for batch verification
Output	vocab	`const llama_vocab *`	Vocabulary from the target model, used for tokenization/detokenization

Prerequisites:

llama_backend_init() must be called before model loading
llama_numa_init(params.numa) should be called for NUMA-aware memory allocation
The target model path is specified via params.model.path (--model CLI flag)

Usage Examples

Complete target model setup for speculative decoding:

#include "common.h"
#include "llama.h"

common_params params;
common_params_parse(argc, argv, params, LLAMA_EXAMPLE_SPECULATIVE);

common_init();
llama_backend_init();
llama_numa_init(params.numa);

// Load target model
auto llama_init_tgt = common_init_from_params(params);
llama_model * model_tgt = llama_init_tgt->model();
llama_context * ctx_tgt = llama_init_tgt->context();

// Get vocabulary for tokenization
const llama_vocab * vocab = llama_model_get_vocab(model_tgt);

// Tokenize prompt using target model's vocabulary
std::vector<llama_token> inp = common_tokenize(ctx_tgt, params.prompt, true, true);

// Verify context is large enough
if (llama_n_ctx(ctx_tgt) < (uint32_t) inp.size()) {
    fprintf(stderr, "prompt exceeds context size\n");
    return 1;
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment