Implementation:Ggml org Llama cpp Common Speculative Init

Field	Value
Implementation Name	Common Speculative Init
Doc Type	API Doc
Workflow	Speculative_Decoding
Step	4 of 5
Source Files	`common/speculative.h`, `common/speculative.cpp`

Overview

Description

The common_speculative_init function creates and initializes the speculative decoding runtime system. It takes the speculative parameters and target context, creates a draft context (for model-based strategies), determines which speculation implementations to use based on the configuration, instantiates the appropriate state objects for each strategy, and returns a common_speculative pointer that orchestrates the draft-then-verify generation loop.

The function implements a strategy chain where multiple implementations are tried in order during generation, using the first one that produces a non-empty draft.

Usage

common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);
if (spec == nullptr) {
    fprintf(stderr, "Failed to initialize speculative decoding\n");
    return 1;
}

Code Reference

Field	Value
Source Location (header)	`common/speculative.h:21-23`
Source Location (impl)	`common/speculative.cpp:839-973`
Import	`#include "speculative.h"`

Function signature:

common_speculative * common_speculative_init(
        common_params_speculative & params,
        llama_context             * ctx_tgt);

Implementation structure (common/speculative.cpp:839-973):

common_speculative * common_speculative_init(
        common_params_speculative & params,
        llama_context             * ctx_tgt) {
    // Create draft context if draft model is available
    llama_context * ctx_dft = nullptr;
    if (params.model_dft) {
        ctx_dft = llama_init_from_model(params.model_dft, params.cparams_dft);
        if (ctx_dft == nullptr) {
            LOG_ERR("%s", "failed to create draft context\n");
            return nullptr;
        }
    }

    // Compute the implementations to use based on configuration
    std::vector<common_speculative_config> configs = {};
    {
        bool has_draft        = !params.mparams_dft.path.empty();
        bool has_ngram_cache  = (params.type == COMMON_SPECULATIVE_TYPE_NGRAM_CACHE);
        bool has_ngram_simple = (params.type == COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE);
        bool has_ngram_map_k  = (params.type == COMMON_SPECULATIVE_TYPE_NGRAM_MAP_K);
        // ... other strategy detection

        // Add strategies in order of preference
        if (has_ngram_simple) {
            configs.push_back(common_speculative_config(
                COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE, params));
        }
        // ... other n-gram strategies
        if (has_draft) {
            configs.push_back(common_speculative_config(
                COMMON_SPECULATIVE_TYPE_DRAFT, params));
        }
    }

    // Create implementation state objects
    std::vector<std::unique_ptr<common_speculative_state>> impls = {};
    for (const common_speculative_config & config : configs) {
        switch (config.type) {
            case COMMON_SPECULATIVE_TYPE_DRAFT:
                impls.push_back(std::make_unique<common_speculative_state_draft>(
                    config.type, ctx_tgt, ctx_dft, params.replacements));
                break;
            case COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE:
                // ... create ngram simple state
                break;
            case COMMON_SPECULATIVE_TYPE_NGRAM_MOD:
                impls.push_back(std::make_unique<common_speculative_state_ngram_mod>(
                    config.type, *config.params.ngram_mod));
                break;
            // ... other strategy types
        }
    }

    if (impls.empty()) {
        LOG_WRN("%s", "no implementations specified\n");
        return nullptr;
    }

    auto * result = new common_speculative {
        /* .impls = */ std::move(impls)
    };
    return result;
}

Related lifecycle functions:

void common_speculative_free(common_speculative * spec);
bool common_speculative_is_compat(llama_context * ctx_tgt);

I/O Contract

Direction	Name	Type	Description
Input	params	`common_params_speculative &`	Speculative configuration (strategy type, draft model, n-gram params). May be modified (e.g., ngram_mod shared instance).
Input	ctx_tgt	`llama_context *`	Target model context for verification
Output	(return)	`common_speculative *`	Opaque pointer to initialized speculative engine, or nullptr on failure

Internal state created:

Draft llama_context (if draft model is configured)
Ordered list of common_speculative_state implementations
Per-strategy state objects (n-gram maps, caches, etc.)

Failure conditions:

Draft context creation fails (returns nullptr)
No implementations match the configuration (returns nullptr)

Usage Examples

Initialize with draft model:

#include "speculative.h"

// After target and draft models are loaded
params.speculative.model_dft = model_dft.get();
params.speculative.cparams_dft = common_context_params_to_llama(params_dft);

common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);
if (!spec) {
    fprintf(stderr, "Speculative init failed\n");
    return 1;
}

// Use in generation loop...

// Cleanup
common_speculative_free(spec);

Initialize with n-gram strategy:

params.speculative.type = COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE;
params.speculative.ngram_size_n = 12;
params.speculative.ngram_size_m = 48;

common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);
// No draft model needed for n-gram strategies

Initialize with n-gram mod (shared instance):

params.speculative.type = COMMON_SPECULATIVE_TYPE_NGRAM_MOD;
params.speculative.ngram_size_n = 16;
// ngram_mod shared instance is automatically created during init if not set

common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment