Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Heuristic:Romsto Speculative Decoding Ngram Order Selection

From Leeroopedia
Knowledge Sources
Domains LLMs, Optimization, Speculative_Decoding
Last Updated 2026-02-14 04:30 GMT

Overview

Selecting the N-gram order (n) for NASD balances context specificity against coverage — n=3 is the default, with higher n giving more precise but sparser predictions.

Description

In N-gram Assisted Speculative Decoding (NASD), the N-gram storage maintains k-grams for k in [2, N]. When predicting the next token, the storage searches from the highest k-gram down to bigrams, returning the most probable continuation from the highest-order match found. The choice of N determines how much context the drafter uses: higher N means more specific context windows but fewer matches (sparser storage), while lower N means broader matches but less precision.

Usage

Use this heuristic when configuring NASD via the `/set_ngramstorage` CLI command or when instantiating `NGramStorage` or `OneLevelNGramStorage`. Consider the length and repetitiveness of your input prompts — longer, more repetitive texts benefit from higher n.

The Insight (Rule of Thumb)

  • Action: Set n via `/set_ngramstorage basic <n>` or `/set_ngramstorage onelevel <n>` in the CLI, or `NGramStorage(n=<value>, vocab_size=...)` in code.
  • Default Value: `n=3` in both `infer.py:39` and `infer.py:110`.
  • Trade-off: Higher n (e.g., 4-5) gives more accurate predictions when context has been seen before, but fewer matches on novel sequences. Lower n (e.g., 2) gives more matches but predictions are less reliable.
  • Storage type: `NGramStorage` (multi-level, k in [2,n]) is preferred over `OneLevelNGramStorage` (only uses exact n-grams) because it provides fallback to lower-order matches.

Reasoning

The multi-level `NGramStorage` searches from highest to lowest order in `ngram_storage.py:170-178`:

for j in range(min(self.n - 1, seq.shape[0]), 1, -1):
    gram = tuple(seq[-j:].tolist())

    if gram in self.ngrams[j]:
        out[i] = self.ngrams[j][gram]
        known[i] = True
        break

This fallback mechanism means higher n values do not sacrifice coverage — if an (n-1)-gram context has not been seen, it falls back to (n-2)-grams, etc. However, the `OneLevelNGramStorage` only checks exact (n-1)-grams without fallback (`ngram_storage.py:86-94`), so its performance is more sensitive to the choice of n.

The storage is initialized with all k-grams from the prompt (`ngram_storage.py:223-245`) and updated dynamically during generation. Longer, more repetitive prompts populate the storage more densely, making higher n values viable.

The `assert n > 1` constraint in `ngram_storage.py:12` enforces a minimum of n=2 (bigrams).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment