Implementation:Ggml org Llama cpp Common Init From Params Target
| Field | Value |
|---|---|
| Implementation Name | Common Init From Params Target |
| Doc Type | Pattern Doc |
| Workflow | Speculative_Decoding |
| Step | 2 of 5 |
| Source File | examples/speculative-simple/speculative-simple.cpp
|
Overview
Description
This implementation documents the pattern of loading the target (large) model for speculative decoding using common_init_from_params(params). In the speculative-simple example, the target model is loaded first using the standard common initialization path, which handles model file loading, context creation, and parameter configuration. The resulting model and context objects become the verification engine for the speculative decoding pipeline.
The target model's vocabulary (obtained via llama_model_get_vocab) serves as the canonical vocabulary for the entire pipeline, including tokenization of the prompt and decoding of the output.
Usage
auto llama_init_tgt = common_init_from_params(params);
llama_model * model_tgt = llama_init_tgt->model();
llama_context * ctx_tgt = llama_init_tgt->context();
Code Reference
| Field | Value |
|---|---|
| Source Location | examples/speculative-simple/speculative-simple.cpp:41-44
|
| Signature | common_init_from_params(params) returns an init object with model() and context() accessors
|
| Import | #include "common.h", #include "llama.h"
|
Target model loading pattern:
// load the target model
auto llama_init_tgt = common_init_from_params(params);
model_tgt = llama_init_tgt->model();
ctx_tgt = llama_init_tgt->context();
Context setup (preceding the load):
int main(int argc, char ** argv) {
common_params params;
if (!common_params_parse(argc, argv, params, LLAMA_EXAMPLE_SPECULATIVE)) {
return 1;
}
// ...
common_init();
// init llama.cpp
llama_backend_init();
llama_numa_init(params.numa);
llama_model * model_tgt = NULL;
llama_context * ctx_tgt = NULL;
// load the target model
auto llama_init_tgt = common_init_from_params(params);
model_tgt = llama_init_tgt->model();
ctx_tgt = llama_init_tgt->context();
const llama_vocab * vocab = llama_model_get_vocab(model_tgt);
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | params | common_params |
Parsed command-line parameters including model path, context size, GPU layers, etc. |
| Output | model_tgt | llama_model * |
Loaded target model |
| Output | ctx_tgt | llama_context * |
Target model inference context configured for batch verification |
| Output | vocab | const llama_vocab * |
Vocabulary from the target model, used for tokenization/detokenization |
Prerequisites:
llama_backend_init()must be called before model loadingllama_numa_init(params.numa)should be called for NUMA-aware memory allocation- The target model path is specified via
params.model.path(--model CLI flag)
Usage Examples
Complete target model setup for speculative decoding:
#include "common.h"
#include "llama.h"
common_params params;
common_params_parse(argc, argv, params, LLAMA_EXAMPLE_SPECULATIVE);
common_init();
llama_backend_init();
llama_numa_init(params.numa);
// Load target model
auto llama_init_tgt = common_init_from_params(params);
llama_model * model_tgt = llama_init_tgt->model();
llama_context * ctx_tgt = llama_init_tgt->context();
// Get vocabulary for tokenization
const llama_vocab * vocab = llama_model_get_vocab(model_tgt);
// Tokenize prompt using target model's vocabulary
std::vector<llama_token> inp = common_tokenize(ctx_tgt, params.prompt, true, true);
// Verify context is large enough
if (llama_n_ctx(ctx_tgt) < (uint32_t) inp.size()) {
fprintf(stderr, "prompt exceeds context size\n");
return 1;
}