Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Speculative Header

From Leeroopedia
Knowledge Sources
Domains Speculative_Decoding, API
Last Updated 2026-02-15 00:00 GMT

Overview

Declares the public API for the speculative decoding subsystem.

Description

Provides functions for lifecycle management (`common_speculative_init`, `common_speculative_free`), compatibility checking (`common_speculative_is_compat`), generation control (`common_speculative_begin` for new sequences, `common_speculative_draft` for generating draft tokens), acceptance feedback (`common_speculative_accept`), and statistics reporting (`common_speculative_print_stats`). The API is designed around a predict-then-verify workflow where draft tokens are generated, then the caller verifies them against the target model.

Usage

Include this header when integrating speculative decoding into an inference pipeline. It hides the complexity of multiple draft strategies (draft model, EAGLE3, n-gram variants) behind a unified API, allowing callers to use a simple init/begin/draft/accept/free lifecycle.

Code Reference

Source Location

Signature

struct common_speculative;

std::string common_speculative_type_name_str();
enum common_speculative_type common_speculative_type_from_name(const std::string & name);
std::string common_speculative_type_to_str(enum common_speculative_type type);

bool common_speculative_is_compat(llama_context * ctx_tgt);

common_speculative * common_speculative_init(
    common_params_speculative & params,
    llama_context             * ctx_tgt);

void common_speculative_free(common_speculative * spec);

void common_speculative_begin(common_speculative * spec, const llama_tokens & prompt);

llama_tokens common_speculative_draft(
    common_speculative * spec,
    const common_params_speculative & params,
    const llama_tokens & prompt,
    llama_token id_last);

void common_speculative_accept(common_speculative * spec, uint16_t n_accepted);

void common_speculative_print_stats(const common_speculative * spec);

Import

#include "speculative.h"

I/O Contract

Inputs

Name Type Required Description
params common_params_speculative & Yes Speculative decoding parameters (strategy, n_draft, etc.)
ctx_tgt llama_context * Yes Target model context for compatibility checking and initialization
spec common_speculative * Yes Speculative decoder instance for operations
prompt const llama_tokens & Yes Token sequence for the current generation context
id_last llama_token Yes The last sampled token used as the starting point for drafting
n_accepted uint16_t Yes (accept) Number of draft tokens accepted by the target model
name const std::string & Yes (type conv) Speculative type name string for conversion

Outputs

Name Type Description
spec common_speculative * Initialized speculative decoder instance
draft llama_tokens Vector of draft tokens generated by the speculative decoder
is_compat bool Whether the target context supports speculative decoding
type_name std::string Human-readable name of the speculative type
type enum common_speculative_type Speculative type enum value converted from string

Usage Examples

#include "speculative.h"

// Check compatibility
if (!common_speculative_is_compat(ctx_tgt)) {
    return;
}

// Initialize speculative decoder
common_speculative * spec = common_speculative_init(params_spec, ctx_tgt);

// Begin a new generation sequence
common_speculative_begin(spec, prompt_tokens);

// Generate draft tokens
llama_tokens draft = common_speculative_draft(spec, params_spec, prompt_tokens, last_token);

// After target model verification, report acceptance
common_speculative_accept(spec, n_accepted);

// Print statistics
common_speculative_print_stats(spec);

// Cleanup
common_speculative_free(spec);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment