Implementation:Ggml org Llama cpp Common Speculative Init
| Field | Value |
|---|---|
| Implementation Name | Common Speculative Init |
| Doc Type | API Doc |
| Workflow | Speculative_Decoding |
| Step | 4 of 5 |
| Source Files | common/speculative.h, common/speculative.cpp
|
Overview
Description
The common_speculative_init function creates and initializes the speculative decoding runtime system. It takes the speculative parameters and target context, creates a draft context (for model-based strategies), determines which speculation implementations to use based on the configuration, instantiates the appropriate state objects for each strategy, and returns a common_speculative pointer that orchestrates the draft-then-verify generation loop.
The function implements a strategy chain where multiple implementations are tried in order during generation, using the first one that produces a non-empty draft.
Usage
common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);
if (spec == nullptr) {
fprintf(stderr, "Failed to initialize speculative decoding\n");
return 1;
}
Code Reference
| Field | Value |
|---|---|
| Source Location (header) | common/speculative.h:21-23
|
| Source Location (impl) | common/speculative.cpp:839-973
|
| Import | #include "speculative.h"
|
Function signature:
common_speculative * common_speculative_init(
common_params_speculative & params,
llama_context * ctx_tgt);
Implementation structure (common/speculative.cpp:839-973):
common_speculative * common_speculative_init(
common_params_speculative & params,
llama_context * ctx_tgt) {
// Create draft context if draft model is available
llama_context * ctx_dft = nullptr;
if (params.model_dft) {
ctx_dft = llama_init_from_model(params.model_dft, params.cparams_dft);
if (ctx_dft == nullptr) {
LOG_ERR("%s", "failed to create draft context\n");
return nullptr;
}
}
// Compute the implementations to use based on configuration
std::vector<common_speculative_config> configs = {};
{
bool has_draft = !params.mparams_dft.path.empty();
bool has_ngram_cache = (params.type == COMMON_SPECULATIVE_TYPE_NGRAM_CACHE);
bool has_ngram_simple = (params.type == COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE);
bool has_ngram_map_k = (params.type == COMMON_SPECULATIVE_TYPE_NGRAM_MAP_K);
// ... other strategy detection
// Add strategies in order of preference
if (has_ngram_simple) {
configs.push_back(common_speculative_config(
COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE, params));
}
// ... other n-gram strategies
if (has_draft) {
configs.push_back(common_speculative_config(
COMMON_SPECULATIVE_TYPE_DRAFT, params));
}
}
// Create implementation state objects
std::vector<std::unique_ptr<common_speculative_state>> impls = {};
for (const common_speculative_config & config : configs) {
switch (config.type) {
case COMMON_SPECULATIVE_TYPE_DRAFT:
impls.push_back(std::make_unique<common_speculative_state_draft>(
config.type, ctx_tgt, ctx_dft, params.replacements));
break;
case COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE:
// ... create ngram simple state
break;
case COMMON_SPECULATIVE_TYPE_NGRAM_MOD:
impls.push_back(std::make_unique<common_speculative_state_ngram_mod>(
config.type, *config.params.ngram_mod));
break;
// ... other strategy types
}
}
if (impls.empty()) {
LOG_WRN("%s", "no implementations specified\n");
return nullptr;
}
auto * result = new common_speculative {
/* .impls = */ std::move(impls)
};
return result;
}
Related lifecycle functions:
void common_speculative_free(common_speculative * spec);
bool common_speculative_is_compat(llama_context * ctx_tgt);
I/O Contract
| Direction | Name | Type | Description |
|---|---|---|---|
| Input | params | common_params_speculative & |
Speculative configuration (strategy type, draft model, n-gram params). May be modified (e.g., ngram_mod shared instance). |
| Input | ctx_tgt | llama_context * |
Target model context for verification |
| Output | (return) | common_speculative * |
Opaque pointer to initialized speculative engine, or nullptr on failure |
Internal state created:
- Draft
llama_context(if draft model is configured) - Ordered list of
common_speculative_stateimplementations - Per-strategy state objects (n-gram maps, caches, etc.)
Failure conditions:
- Draft context creation fails (returns nullptr)
- No implementations match the configuration (returns nullptr)
Usage Examples
Initialize with draft model:
#include "speculative.h"
// After target and draft models are loaded
params.speculative.model_dft = model_dft.get();
params.speculative.cparams_dft = common_context_params_to_llama(params_dft);
common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);
if (!spec) {
fprintf(stderr, "Speculative init failed\n");
return 1;
}
// Use in generation loop...
// Cleanup
common_speculative_free(spec);
Initialize with n-gram strategy:
params.speculative.type = COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE;
params.speculative.ngram_size_n = 12;
params.speculative.ngram_size_m = 48;
common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);
// No draft model needed for n-gram strategies
Initialize with n-gram mod (shared instance):
params.speculative.type = COMMON_SPECULATIVE_TYPE_NGRAM_MOD;
params.speculative.ngram_size_n = 16;
// ngram_mod shared instance is automatically created during init if not set
common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);