Implementation:Ggml org Llama cpp Ngram Map Header

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Speculative_Decoding, Caching
Last Updated	2026-02-15 00:00 GMT

Overview

Declares data structures and functions for n-gram map based self-speculative decoding.

Description

Defines `common_ngram_simple_config` for the simple linear scan algorithm, and the map-based approach with `common_ngram_map_value` (tracking m-gram position, count, and acceptance history), `common_ngram_map_key` (tracking n-gram position, statistics index, count, and up to 4 value slots), and `common_ngram_map` (the main structure with key/value sizes, hash map for fast lookup, and state tracking for incremental updates). Also declares the `common_ngram_simple_draft` function for the simple algorithm variant.

Usage

Use this header when implementing map-based self-speculative decoding. Initialize a map with token history using `common_ngram_map_begin`, generate draft tokens with `common_ngram_map_draft`, and update acceptance statistics with `common_ngram_map_accept` after the target model verifies the draft.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: common/ngram-map.h
Lines: 1-115

Signature

struct common_ngram_simple_config {
    uint16_t size_ngram;
    uint16_t size_mgram;
};

llama_tokens common_ngram_simple_draft(
    const common_ngram_simple_config & config,
    const llama_tokens & tokens, llama_token sampled);

struct common_ngram_map_value {
    size_t   value_idx  = 0;
    uint16_t value_num  = 0;
    int16_t  n_accepted = -1;
};

struct common_ngram_map_key {
    size_t   key_idx;
    size_t   stat_idx;
    uint16_t key_num;
    common_ngram_map_value values[COMMON_NGRAM_MAX_VALUES];
};

struct common_ngram_map {
    common_ngram_map(uint16_t sz_key, uint16_t sz_value, bool only_keys, uint16_t min_hits);
    // ... state tracking and hash map fields
};

void common_ngram_map_begin(common_ngram_map & map, const llama_tokens & tokens);
void common_ngram_map_draft(common_ngram_map & map, const llama_tokens & inp,
                            llama_token sampled, llama_tokens & draft);
void common_ngram_map_accept(common_ngram_map & map, uint16_t n_accepted);

Import

#include "ngram-map.h"

I/O Contract

Inputs

Name	Type	Required	Description
map	common_ngram_map &	Yes	The n-gram map structure to operate on
tokens	const llama_tokens &	Yes	Token history for initialization or drafting
sampled	llama_token	Yes	The most recently sampled token
config	const common_ngram_simple_config &	Yes (simple mode)	Configuration for simple n-gram lookup (ngram/mgram sizes)
n_accepted	uint16_t	Yes (accept)	Number of draft tokens accepted by the target model

Outputs

Name	Type	Description
draft	llama_tokens &	Draft token sequence generated from map lookup (for map draft)
return	llama_tokens	Draft token sequence returned from simple draft
(side effect)	void	Map state is updated in-place during begin/draft/accept operations

Usage Examples

#include "ngram-map.h"

// Simple n-gram draft
common_ngram_simple_config config = {4, 3};
llama_tokens draft = common_ngram_simple_draft(config, token_history, last_token);

// Map-based n-gram draft
common_ngram_map map(4, 3, false, 2);
common_ngram_map_begin(map, token_history);
llama_tokens map_draft;
common_ngram_map_draft(map, token_history, last_token, map_draft);
common_ngram_map_accept(map, n_accepted_by_target);

Related Pages

Principle:Ggml_org_Llama_cpp_Speculative_Decoding

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment