Implementation:Ggml org Llama cpp Ngram Mod

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Speculative_Decoding, Hashing
Last Updated	2026-02-15 00:00 GMT

Overview

Implements a simple hash-table-based n-gram store that maps n-gram token sequences to a single predicted next token.

Description

Uses a fixed-size vector as a hash table with open addressing and no collision handling (entries simply overwrite). The hash function multiplies tokens by the LCG constant 6364136223846793005 and takes modulo of the table size. The `add()` method stores the token following the n-gram at the computed index, `get()` retrieves it (returning -1 if empty), and `reset()` fills all entries with the EMPTY sentinel value (-1).

Usage

Use this module when a lightweight, memory-efficient n-gram lookup is needed for speculative decoding. The simplicity (no collision resolution) trades accuracy for speed and bounded memory, making it suitable for real-time draft token prediction.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: common/ngram-mod.cpp
Lines: 1-60

Signature

common_ngram_mod::common_ngram_mod(uint16_t n, size_t size);
size_t  common_ngram_mod::idx(const entry_t * tokens) const;
void    common_ngram_mod::add(const entry_t * tokens);
entry_t common_ngram_mod::get(const entry_t * tokens) const;
void    common_ngram_mod::reset();
size_t  common_ngram_mod::get_n() const;
size_t  common_ngram_mod::get_used() const;
size_t  common_ngram_mod::size() const;
size_t  common_ngram_mod::size_bytes() const;

Import

#include "ngram-mod.h"

I/O Contract

Inputs

Name	Type	Required	Description
n	uint16_t	Yes	N-gram size (number of tokens to hash)
size	size_t	Yes	Number of entries in the hash table
tokens	const entry_t *	Yes	Pointer to an array of at least n+1 tokens (n for the key, +1 for the value to store via add())

Outputs

Name	Type	Description
idx return	size_t	Hash table index for the given n-gram
get return	entry_t (int32_t)	Predicted next token, or -1 (EMPTY) if not found
get_n return	size_t	The configured n-gram size
get_used return	size_t	Number of occupied entries in the hash table
size return	size_t	Total number of entries in the hash table
size_bytes return	size_t	Memory usage of the entries vector in bytes

Usage Examples

#include "ngram-mod.h"

// Create a 4-gram hash table with 1M entries
common_ngram_mod ngram(4, 1000000);

// Add an n-gram: tokens[0..3] are the key, tokens[4] is the predicted next token
int32_t tokens[] = {101, 202, 303, 404, 505};
ngram.add(tokens);

// Look up the predicted next token for a key
int32_t key[] = {101, 202, 303, 404};
int32_t prediction = ngram.get(key);  // returns 505

// Reset the table
ngram.reset();

Related Pages

Principle:Ggml_org_Llama_cpp_Speculative_Decoding

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment