Implementation:Ggml org Llama cpp Ngram Mod
| Knowledge Sources | |
|---|---|
| Domains | Speculative_Decoding, Hashing |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Implements a simple hash-table-based n-gram store that maps n-gram token sequences to a single predicted next token.
Description
Uses a fixed-size vector as a hash table with open addressing and no collision handling (entries simply overwrite). The hash function multiplies tokens by the LCG constant 6364136223846793005 and takes modulo of the table size. The `add()` method stores the token following the n-gram at the computed index, `get()` retrieves it (returning -1 if empty), and `reset()` fills all entries with the EMPTY sentinel value (-1).
Usage
Use this module when a lightweight, memory-efficient n-gram lookup is needed for speculative decoding. The simplicity (no collision resolution) trades accuracy for speed and bounded memory, making it suitable for real-time draft token prediction.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: common/ngram-mod.cpp
- Lines: 1-60
Signature
common_ngram_mod::common_ngram_mod(uint16_t n, size_t size);
size_t common_ngram_mod::idx(const entry_t * tokens) const;
void common_ngram_mod::add(const entry_t * tokens);
entry_t common_ngram_mod::get(const entry_t * tokens) const;
void common_ngram_mod::reset();
size_t common_ngram_mod::get_n() const;
size_t common_ngram_mod::get_used() const;
size_t common_ngram_mod::size() const;
size_t common_ngram_mod::size_bytes() const;
Import
#include "ngram-mod.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| n | uint16_t | Yes | N-gram size (number of tokens to hash) |
| size | size_t | Yes | Number of entries in the hash table |
| tokens | const entry_t * | Yes | Pointer to an array of at least n+1 tokens (n for the key, +1 for the value to store via add()) |
Outputs
| Name | Type | Description |
|---|---|---|
| idx return | size_t | Hash table index for the given n-gram |
| get return | entry_t (int32_t) | Predicted next token, or -1 (EMPTY) if not found |
| get_n return | size_t | The configured n-gram size |
| get_used return | size_t | Number of occupied entries in the hash table |
| size return | size_t | Total number of entries in the hash table |
| size_bytes return | size_t | Memory usage of the entries vector in bytes |
Usage Examples
#include "ngram-mod.h"
// Create a 4-gram hash table with 1M entries
common_ngram_mod ngram(4, 1000000);
// Add an n-gram: tokens[0..3] are the key, tokens[4] is the predicted next token
int32_t tokens[] = {101, 202, 303, 404, 505};
ngram.add(tokens);
// Look up the predicted next token for a key
int32_t key[] = {101, 202, 303, 404};
int32_t prediction = ngram.get(key); // returns 505
// Reset the table
ngram.reset();