Implementation:Ggml org Llama cpp Ngram Map Header
| Knowledge Sources | |
|---|---|
| Domains | Speculative_Decoding, Caching |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares data structures and functions for n-gram map based self-speculative decoding.
Description
Defines `common_ngram_simple_config` for the simple linear scan algorithm, and the map-based approach with `common_ngram_map_value` (tracking m-gram position, count, and acceptance history), `common_ngram_map_key` (tracking n-gram position, statistics index, count, and up to 4 value slots), and `common_ngram_map` (the main structure with key/value sizes, hash map for fast lookup, and state tracking for incremental updates). Also declares the `common_ngram_simple_draft` function for the simple algorithm variant.
Usage
Use this header when implementing map-based self-speculative decoding. Initialize a map with token history using `common_ngram_map_begin`, generate draft tokens with `common_ngram_map_draft`, and update acceptance statistics with `common_ngram_map_accept` after the target model verifies the draft.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: common/ngram-map.h
- Lines: 1-115
Signature
struct common_ngram_simple_config {
uint16_t size_ngram;
uint16_t size_mgram;
};
llama_tokens common_ngram_simple_draft(
const common_ngram_simple_config & config,
const llama_tokens & tokens, llama_token sampled);
struct common_ngram_map_value {
size_t value_idx = 0;
uint16_t value_num = 0;
int16_t n_accepted = -1;
};
struct common_ngram_map_key {
size_t key_idx;
size_t stat_idx;
uint16_t key_num;
common_ngram_map_value values[COMMON_NGRAM_MAX_VALUES];
};
struct common_ngram_map {
common_ngram_map(uint16_t sz_key, uint16_t sz_value, bool only_keys, uint16_t min_hits);
// ... state tracking and hash map fields
};
void common_ngram_map_begin(common_ngram_map & map, const llama_tokens & tokens);
void common_ngram_map_draft(common_ngram_map & map, const llama_tokens & inp,
llama_token sampled, llama_tokens & draft);
void common_ngram_map_accept(common_ngram_map & map, uint16_t n_accepted);
Import
#include "ngram-map.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| map | common_ngram_map & | Yes | The n-gram map structure to operate on |
| tokens | const llama_tokens & | Yes | Token history for initialization or drafting |
| sampled | llama_token | Yes | The most recently sampled token |
| config | const common_ngram_simple_config & | Yes (simple mode) | Configuration for simple n-gram lookup (ngram/mgram sizes) |
| n_accepted | uint16_t | Yes (accept) | Number of draft tokens accepted by the target model |
Outputs
| Name | Type | Description |
|---|---|---|
| draft | llama_tokens & | Draft token sequence generated from map lookup (for map draft) |
| return | llama_tokens | Draft token sequence returned from simple draft |
| (side effect) | void | Map state is updated in-place during begin/draft/accept operations |
Usage Examples
#include "ngram-map.h"
// Simple n-gram draft
common_ngram_simple_config config = {4, 3};
llama_tokens draft = common_ngram_simple_draft(config, token_history, last_token);
// Map-based n-gram draft
common_ngram_map map(4, 3, false, 2);
common_ngram_map_begin(map, token_history);
llama_tokens map_draft;
common_ngram_map_draft(map, token_history, last_token, map_draft);
common_ngram_map_accept(map, n_accepted_by_target);