Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Ngram Map Header

From Leeroopedia
Knowledge Sources
Domains Speculative_Decoding, Caching
Last Updated 2026-02-15 00:00 GMT

Overview

Declares data structures and functions for n-gram map based self-speculative decoding.

Description

Defines `common_ngram_simple_config` for the simple linear scan algorithm, and the map-based approach with `common_ngram_map_value` (tracking m-gram position, count, and acceptance history), `common_ngram_map_key` (tracking n-gram position, statistics index, count, and up to 4 value slots), and `common_ngram_map` (the main structure with key/value sizes, hash map for fast lookup, and state tracking for incremental updates). Also declares the `common_ngram_simple_draft` function for the simple algorithm variant.

Usage

Use this header when implementing map-based self-speculative decoding. Initialize a map with token history using `common_ngram_map_begin`, generate draft tokens with `common_ngram_map_draft`, and update acceptance statistics with `common_ngram_map_accept` after the target model verifies the draft.

Code Reference

Source Location

Signature

struct common_ngram_simple_config {
    uint16_t size_ngram;
    uint16_t size_mgram;
};

llama_tokens common_ngram_simple_draft(
    const common_ngram_simple_config & config,
    const llama_tokens & tokens, llama_token sampled);

struct common_ngram_map_value {
    size_t   value_idx  = 0;
    uint16_t value_num  = 0;
    int16_t  n_accepted = -1;
};

struct common_ngram_map_key {
    size_t   key_idx;
    size_t   stat_idx;
    uint16_t key_num;
    common_ngram_map_value values[COMMON_NGRAM_MAX_VALUES];
};

struct common_ngram_map {
    common_ngram_map(uint16_t sz_key, uint16_t sz_value, bool only_keys, uint16_t min_hits);
    // ... state tracking and hash map fields
};

void common_ngram_map_begin(common_ngram_map & map, const llama_tokens & tokens);
void common_ngram_map_draft(common_ngram_map & map, const llama_tokens & inp,
                            llama_token sampled, llama_tokens & draft);
void common_ngram_map_accept(common_ngram_map & map, uint16_t n_accepted);

Import

#include "ngram-map.h"

I/O Contract

Inputs

Name Type Required Description
map common_ngram_map & Yes The n-gram map structure to operate on
tokens const llama_tokens & Yes Token history for initialization or drafting
sampled llama_token Yes The most recently sampled token
config const common_ngram_simple_config & Yes (simple mode) Configuration for simple n-gram lookup (ngram/mgram sizes)
n_accepted uint16_t Yes (accept) Number of draft tokens accepted by the target model

Outputs

Name Type Description
draft llama_tokens & Draft token sequence generated from map lookup (for map draft)
return llama_tokens Draft token sequence returned from simple draft
(side effect) void Map state is updated in-place during begin/draft/accept operations

Usage Examples

#include "ngram-map.h"

// Simple n-gram draft
common_ngram_simple_config config = {4, 3};
llama_tokens draft = common_ngram_simple_draft(config, token_history, last_token);

// Map-based n-gram draft
common_ngram_map map(4, 3, false, 2);
common_ngram_map_begin(map, token_history);
llama_tokens map_draft;
common_ngram_map_draft(map, token_history, last_token, map_draft);
common_ngram_map_accept(map, n_accepted_by_target);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment