Implementation:Ggml org Llama cpp Speculative Header
| Knowledge Sources | |
|---|---|
| Domains | Speculative_Decoding, API |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares the public API for the speculative decoding subsystem.
Description
Provides functions for lifecycle management (`common_speculative_init`, `common_speculative_free`), compatibility checking (`common_speculative_is_compat`), generation control (`common_speculative_begin` for new sequences, `common_speculative_draft` for generating draft tokens), acceptance feedback (`common_speculative_accept`), and statistics reporting (`common_speculative_print_stats`). The API is designed around a predict-then-verify workflow where draft tokens are generated, then the caller verifies them against the target model.
Usage
Include this header when integrating speculative decoding into an inference pipeline. It hides the complexity of multiple draft strategies (draft model, EAGLE3, n-gram variants) behind a unified API, allowing callers to use a simple init/begin/draft/accept/free lifecycle.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: common/speculative.h
- Lines: 1-41
Signature
struct common_speculative;
std::string common_speculative_type_name_str();
enum common_speculative_type common_speculative_type_from_name(const std::string & name);
std::string common_speculative_type_to_str(enum common_speculative_type type);
bool common_speculative_is_compat(llama_context * ctx_tgt);
common_speculative * common_speculative_init(
common_params_speculative & params,
llama_context * ctx_tgt);
void common_speculative_free(common_speculative * spec);
void common_speculative_begin(common_speculative * spec, const llama_tokens & prompt);
llama_tokens common_speculative_draft(
common_speculative * spec,
const common_params_speculative & params,
const llama_tokens & prompt,
llama_token id_last);
void common_speculative_accept(common_speculative * spec, uint16_t n_accepted);
void common_speculative_print_stats(const common_speculative * spec);
Import
#include "speculative.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| params | common_params_speculative & | Yes | Speculative decoding parameters (strategy, n_draft, etc.) |
| ctx_tgt | llama_context * | Yes | Target model context for compatibility checking and initialization |
| spec | common_speculative * | Yes | Speculative decoder instance for operations |
| prompt | const llama_tokens & | Yes | Token sequence for the current generation context |
| id_last | llama_token | Yes | The last sampled token used as the starting point for drafting |
| n_accepted | uint16_t | Yes (accept) | Number of draft tokens accepted by the target model |
| name | const std::string & | Yes (type conv) | Speculative type name string for conversion |
Outputs
| Name | Type | Description |
|---|---|---|
| spec | common_speculative * | Initialized speculative decoder instance |
| draft | llama_tokens | Vector of draft tokens generated by the speculative decoder |
| is_compat | bool | Whether the target context supports speculative decoding |
| type_name | std::string | Human-readable name of the speculative type |
| type | enum common_speculative_type | Speculative type enum value converted from string |
Usage Examples
#include "speculative.h"
// Check compatibility
if (!common_speculative_is_compat(ctx_tgt)) {
return;
}
// Initialize speculative decoder
common_speculative * spec = common_speculative_init(params_spec, ctx_tgt);
// Begin a new generation sequence
common_speculative_begin(spec, prompt_tokens);
// Generate draft tokens
llama_tokens draft = common_speculative_draft(spec, params_spec, prompt_tokens, last_token);
// After target model verification, report acceptance
common_speculative_accept(spec, n_accepted);
// Print statistics
common_speculative_print_stats(spec);
// Cleanup
common_speculative_free(spec);