Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Turboderp org Exllamav2 ExLlamaV2PrefixFilter

From Leeroopedia
Knowledge Sources
Domains Filtering, Constrained_Generation
Last Updated 2026-02-15 00:00 GMT

Overview

Token filter that constrains generation to begin with one of a set of allowed prefix strings, using trie-based matching against the tokenizer vocabulary to efficiently compute allowed tokens at each step.

Description

ExLlamaV2PrefixFilter is a subclass of ExLlamaV2Filter that forces the model's output to start with one of the specified prefix strings. Once the shortest matching prefix has been fully generated, the constraint is released and generation continues freely.

Key components:

  • __init__(model, tokenizer, prefix_strings) -- Accepts a single string or list of strings as allowed prefixes. Stores them in prefix_strings and initialises current_prefixes (a set tracking still-viable prefixes) and current_str (the generated text so far).
  • clone(c=None) -- Creates a copy of the filter preserving prefix_strings, current_prefixes, and current_str state.
  • begin(prefix_str) -- Resets current_prefixes to the full set of all configured prefix strings and clears current_str.
  • feed(token) -- Decodes the token to its string piece via tokenizer.get_id_to_piece_list(), appends it to current_str, and prunes any prefix from current_prefixes that no longer matches the generated text.
  • next() -- If the generated string already satisfies the shortest remaining prefix (i.e., len(current_str) >= min_valid_length), returns (None, set()) to indicate no constraint. Otherwise, for each remaining prefix, it traverses the tokenizer's character trie to find all token IDs that would advance along the prefix path, and also checks the prefix-to-IDs dictionary for tokens that could complete the remaining string in one step. Returns (pass_tokens_all, set()).

The filter relies on two precomputed tokenizer data structures:

  • tokenizer.get_char_trie() -- A character-level trie over all token pieces, where each node stores leaf token IDs.
  • tokenizer.get_prefix_to_ids_dict() -- A dictionary mapping string prefixes to sets of token IDs that decode to exactly that prefix.

Note that if two prefix strings share a common prefix (e.g., "story" and "storytime"), only the shorter one is effective since matching it fully satisfies the constraint.

Usage

Use ExLlamaV2PrefixFilter when you need to guarantee that generated text starts with a specific string or one of several candidate strings, such as ensuring a function call begins with a known prefix, or forcing a response to start with "Yes" or "No".

Code Reference

Source Location

Signature

class ExLlamaV2PrefixFilter(ExLlamaV2Filter):

    prefix_strings: list[str]
    current_prefixes: set[str]
    current_str: str

    def __init__(self,
                 model: ExLlamaV2,
                 tokenizer: ExLlamaV2Tokenizer,
                 prefix_strings: str | list[str]):
        ...

    def clone(self, c=None) -> ExLlamaV2PrefixFilter:
        ...

    def begin(self, prefix_str: str = "") -> None:
        ...

    def feed(self, token: int) -> None:
        ...

    def next(self) -> tuple[set[int] | None, set]:
        ...

Import

from exllamav2.generator.filters import ExLlamaV2PrefixFilter

I/O Contract

Inputs

Name Type Required Description
model ExLlamaV2 Yes The loaded ExLlamaV2 model instance
tokenizer ExLlamaV2Tokenizer Yes The tokenizer associated with the model
prefix_strings str or list[str] Yes One or more prefix strings that generation must start with
prefix_str str No (begin, default "") Context prefix string passed at generation start (not used by this filter)
token int Yes (feed) Token ID selected by the sampler

Outputs

Name Type Description
pass_tokens set[int] or None From next(): set of allowed token IDs, or None when the prefix constraint is fully satisfied
end_tokens set From next(): always an empty set (this filter does not define end-of-constraint tokens)

Usage Examples

Force Response to Start with a Specific Prefix

from exllamav2.generator.filters import ExLlamaV2PrefixFilter
from exllamav2.generator import ExLlamaV2DynamicJob

# Force generation to begin with "def " or "class "
prefix_filter = ExLlamaV2PrefixFilter(
    model, tokenizer,
    prefix_strings=["def ", "class "]
)

job = ExLlamaV2DynamicJob(
    input_ids=input_ids,
    gen_settings=gen_settings,
    max_new_tokens=256,
    filters=[prefix_filter],
)
generator.enqueue(job)

Single Prefix Constraint

from exllamav2.generator.filters import ExLlamaV2PrefixFilter

# Ensure the model starts its response with "Sure, "
prefix_filter = ExLlamaV2PrefixFilter(model, tokenizer, "Sure, ")

Related Pages

Implements Principle

Extends

Depends On

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment