Implementation:Ggml org Llama cpp Speculative Header

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Speculative_Decoding, API
Last Updated	2026-02-15 00:00 GMT

Overview

Declares the public API for the speculative decoding subsystem.

Description

Provides functions for lifecycle management (`common_speculative_init`, `common_speculative_free`), compatibility checking (`common_speculative_is_compat`), generation control (`common_speculative_begin` for new sequences, `common_speculative_draft` for generating draft tokens), acceptance feedback (`common_speculative_accept`), and statistics reporting (`common_speculative_print_stats`). The API is designed around a predict-then-verify workflow where draft tokens are generated, then the caller verifies them against the target model.

Usage

Include this header when integrating speculative decoding into an inference pipeline. It hides the complexity of multiple draft strategies (draft model, EAGLE3, n-gram variants) behind a unified API, allowing callers to use a simple init/begin/draft/accept/free lifecycle.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: common/speculative.h
Lines: 1-41

Signature

struct common_speculative;

std::string common_speculative_type_name_str();
enum common_speculative_type common_speculative_type_from_name(const std::string & name);
std::string common_speculative_type_to_str(enum common_speculative_type type);

bool common_speculative_is_compat(llama_context * ctx_tgt);

common_speculative * common_speculative_init(
    common_params_speculative & params,
    llama_context             * ctx_tgt);

void common_speculative_free(common_speculative * spec);

void common_speculative_begin(common_speculative * spec, const llama_tokens & prompt);

llama_tokens common_speculative_draft(
    common_speculative * spec,
    const common_params_speculative & params,
    const llama_tokens & prompt,
    llama_token id_last);

void common_speculative_accept(common_speculative * spec, uint16_t n_accepted);

void common_speculative_print_stats(const common_speculative * spec);

Import

#include "speculative.h"

I/O Contract

Inputs

Name	Type	Required	Description
params	common_params_speculative &	Yes	Speculative decoding parameters (strategy, n_draft, etc.)
ctx_tgt	llama_context *	Yes	Target model context for compatibility checking and initialization
spec	common_speculative *	Yes	Speculative decoder instance for operations
prompt	const llama_tokens &	Yes	Token sequence for the current generation context
id_last	llama_token	Yes	The last sampled token used as the starting point for drafting
n_accepted	uint16_t	Yes (accept)	Number of draft tokens accepted by the target model
name	const std::string &	Yes (type conv)	Speculative type name string for conversion

Outputs

Name	Type	Description
spec	common_speculative *	Initialized speculative decoder instance
draft	llama_tokens	Vector of draft tokens generated by the speculative decoder
is_compat	bool	Whether the target context supports speculative decoding
type_name	std::string	Human-readable name of the speculative type
type	enum common_speculative_type	Speculative type enum value converted from string

Usage Examples

#include "speculative.h"

// Check compatibility
if (!common_speculative_is_compat(ctx_tgt)) {
    return;
}

// Initialize speculative decoder
common_speculative * spec = common_speculative_init(params_spec, ctx_tgt);

// Begin a new generation sequence
common_speculative_begin(spec, prompt_tokens);

// Generate draft tokens
llama_tokens draft = common_speculative_draft(spec, params_spec, prompt_tokens, last_token);

// After target model verification, report acceptance
common_speculative_accept(spec, n_accepted);

// Print statistics
common_speculative_print_stats(spec);

// Cleanup
common_speculative_free(spec);

Related Pages

Principle:Ggml_org_Llama_cpp_Speculative_Decoding

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment