Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Common Speculative Init

From Leeroopedia
Revision as of 12:39, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Common_Speculative_Init.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Field Value
Implementation Name Common Speculative Init
Doc Type API Doc
Workflow Speculative_Decoding
Step 4 of 5
Source Files common/speculative.h, common/speculative.cpp

Overview

Description

The common_speculative_init function creates and initializes the speculative decoding runtime system. It takes the speculative parameters and target context, creates a draft context (for model-based strategies), determines which speculation implementations to use based on the configuration, instantiates the appropriate state objects for each strategy, and returns a common_speculative pointer that orchestrates the draft-then-verify generation loop.

The function implements a strategy chain where multiple implementations are tried in order during generation, using the first one that produces a non-empty draft.

Usage

common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);
if (spec == nullptr) {
    fprintf(stderr, "Failed to initialize speculative decoding\n");
    return 1;
}

Code Reference

Field Value
Source Location (header) common/speculative.h:21-23
Source Location (impl) common/speculative.cpp:839-973
Import #include "speculative.h"

Function signature:

common_speculative * common_speculative_init(
        common_params_speculative & params,
        llama_context             * ctx_tgt);

Implementation structure (common/speculative.cpp:839-973):

common_speculative * common_speculative_init(
        common_params_speculative & params,
        llama_context             * ctx_tgt) {
    // Create draft context if draft model is available
    llama_context * ctx_dft = nullptr;
    if (params.model_dft) {
        ctx_dft = llama_init_from_model(params.model_dft, params.cparams_dft);
        if (ctx_dft == nullptr) {
            LOG_ERR("%s", "failed to create draft context\n");
            return nullptr;
        }
    }

    // Compute the implementations to use based on configuration
    std::vector<common_speculative_config> configs = {};
    {
        bool has_draft        = !params.mparams_dft.path.empty();
        bool has_ngram_cache  = (params.type == COMMON_SPECULATIVE_TYPE_NGRAM_CACHE);
        bool has_ngram_simple = (params.type == COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE);
        bool has_ngram_map_k  = (params.type == COMMON_SPECULATIVE_TYPE_NGRAM_MAP_K);
        // ... other strategy detection

        // Add strategies in order of preference
        if (has_ngram_simple) {
            configs.push_back(common_speculative_config(
                COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE, params));
        }
        // ... other n-gram strategies
        if (has_draft) {
            configs.push_back(common_speculative_config(
                COMMON_SPECULATIVE_TYPE_DRAFT, params));
        }
    }

    // Create implementation state objects
    std::vector<std::unique_ptr<common_speculative_state>> impls = {};
    for (const common_speculative_config & config : configs) {
        switch (config.type) {
            case COMMON_SPECULATIVE_TYPE_DRAFT:
                impls.push_back(std::make_unique<common_speculative_state_draft>(
                    config.type, ctx_tgt, ctx_dft, params.replacements));
                break;
            case COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE:
                // ... create ngram simple state
                break;
            case COMMON_SPECULATIVE_TYPE_NGRAM_MOD:
                impls.push_back(std::make_unique<common_speculative_state_ngram_mod>(
                    config.type, *config.params.ngram_mod));
                break;
            // ... other strategy types
        }
    }

    if (impls.empty()) {
        LOG_WRN("%s", "no implementations specified\n");
        return nullptr;
    }

    auto * result = new common_speculative {
        /* .impls = */ std::move(impls)
    };
    return result;
}

Related lifecycle functions:

void common_speculative_free(common_speculative * spec);
bool common_speculative_is_compat(llama_context * ctx_tgt);

I/O Contract

Direction Name Type Description
Input params common_params_speculative & Speculative configuration (strategy type, draft model, n-gram params). May be modified (e.g., ngram_mod shared instance).
Input ctx_tgt llama_context * Target model context for verification
Output (return) common_speculative * Opaque pointer to initialized speculative engine, or nullptr on failure

Internal state created:

  • Draft llama_context (if draft model is configured)
  • Ordered list of common_speculative_state implementations
  • Per-strategy state objects (n-gram maps, caches, etc.)

Failure conditions:

  • Draft context creation fails (returns nullptr)
  • No implementations match the configuration (returns nullptr)

Usage Examples

Initialize with draft model:

#include "speculative.h"

// After target and draft models are loaded
params.speculative.model_dft = model_dft.get();
params.speculative.cparams_dft = common_context_params_to_llama(params_dft);

common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);
if (!spec) {
    fprintf(stderr, "Speculative init failed\n");
    return 1;
}

// Use in generation loop...

// Cleanup
common_speculative_free(spec);

Initialize with n-gram strategy:

params.speculative.type = COMMON_SPECULATIVE_TYPE_NGRAM_SIMPLE;
params.speculative.ngram_size_n = 12;
params.speculative.ngram_size_m = 48;

common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);
// No draft model needed for n-gram strategies

Initialize with n-gram mod (shared instance):

params.speculative.type = COMMON_SPECULATIVE_TYPE_NGRAM_MOD;
params.speculative.ngram_size_n = 16;
// ngram_mod shared instance is automatically created during init if not set

common_speculative * spec = common_speculative_init(params.speculative, ctx_tgt);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment