Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Ngram Map

From Leeroopedia
Knowledge Sources
Domains Speculative_Decoding, Caching
Last Updated 2026-02-15 00:00 GMT

Overview

Implements two n-gram based self-speculative decoding algorithms: a simple linear search and a hash-map based approach for faster n-gram lookup in token history.

Description

common_ngram_simple_draft performs a linear backward scan through token history to find an n-gram match, then copies the subsequent m-gram as draft tokens. The common_ngram_map_* family of functions maintains a hash map from n-gram hashes to indices in token history, with statistics about m-gram values following each key. The map uses LCG hashing for n-gram fingerprinting and tracks up to COMMON_NGRAM_MAX_VALUES (4) most frequent following sequences per n-gram, selecting the best draft based on occurrence counts and acceptance history.

Usage

Use this module for self-speculative decoding without a separate draft model, leveraging the model's own token history for pattern-based prediction. The map variant offers better performance on long contexts compared to the simple linear scan.

Code Reference

Source Location

Signature

static uint32_t common_ngram_map_hash(
    const llama_tokens & tokens, size_t start, size_t len);

llama_tokens common_ngram_simple_draft(
    const common_ngram_simple_config & config,
    const llama_tokens & tokens,
    llama_token sampled);

// Hash-map based n-gram functions
void common_ngram_map_update(/* ... */);
llama_tokens common_ngram_map_draft(/* ... */);

Import

#include "common.h"
#include "log.h"
#include "ngram-map.h"
#include <cinttypes>
#include <cstdint>
#include <sstream>

I/O Contract

Inputs

Name Type Required Description
config common_ngram_simple_config Yes Configuration specifying n-gram size and m-gram (draft) size
tokens llama_tokens (vector<llama_token>) Yes Full token history to search for matching patterns
sampled llama_token Yes The last sampled token, appended to form the search pattern
start size_t Yes Start offset in token array for hash computation
len size_t Yes Length of n-gram for hash computation

Outputs

Name Type Description
draft_tokens llama_tokens (vector<llama_token>) Vector of predicted draft tokens based on n-gram match, empty if no match found
hash uint32_t LCG hash value of an n-gram segment

Usage Examples

#include "ngram-map.h"

// Simple linear search speculative decoding
common_ngram_simple_config config;
config.size_ngram = 4;  // search for 4-gram matches
config.size_mgram = 8;  // use up to 8 following tokens as draft

llama_tokens history = /* ... token history ... */;
llama_token last_token = /* ... last sampled token ... */;

llama_tokens draft = common_ngram_simple_draft(config, history, last_token);
if (!draft.empty()) {
    // Use draft tokens for speculative verification
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment