Implementation:Turboderp org Exllamav2 ExLlamaV2PrefixFilter

Knowledge Sources	Turboderp_org_Exllamav2
Domains	Filtering, Constrained_Generation
Last Updated	2026-02-15 00:00 GMT

Overview

Token filter that constrains generation to begin with one of a set of allowed prefix strings, using trie-based matching against the tokenizer vocabulary to efficiently compute allowed tokens at each step.

Description

ExLlamaV2PrefixFilter is a subclass of ExLlamaV2Filter that forces the model's output to start with one of the specified prefix strings. Once the shortest matching prefix has been fully generated, the constraint is released and generation continues freely.

Key components:

__init__(model, tokenizer, prefix_strings) -- Accepts a single string or list of strings as allowed prefixes. Stores them in prefix_strings and initialises current_prefixes (a set tracking still-viable prefixes) and current_str (the generated text so far).
clone(c=None) -- Creates a copy of the filter preserving prefix_strings, current_prefixes, and current_str state.
begin(prefix_str) -- Resets current_prefixes to the full set of all configured prefix strings and clears current_str.
feed(token) -- Decodes the token to its string piece via tokenizer.get_id_to_piece_list(), appends it to current_str, and prunes any prefix from current_prefixes that no longer matches the generated text.
next() -- If the generated string already satisfies the shortest remaining prefix (i.e., len(current_str) >= min_valid_length), returns (None, set()) to indicate no constraint. Otherwise, for each remaining prefix, it traverses the tokenizer's character trie to find all token IDs that would advance along the prefix path, and also checks the prefix-to-IDs dictionary for tokens that could complete the remaining string in one step. Returns (pass_tokens_all, set()).

The filter relies on two precomputed tokenizer data structures:

tokenizer.get_char_trie() -- A character-level trie over all token pieces, where each node stores leaf token IDs.
tokenizer.get_prefix_to_ids_dict() -- A dictionary mapping string prefixes to sets of token IDs that decode to exactly that prefix.

Note that if two prefix strings share a common prefix (e.g., "story" and "storytime"), only the shorter one is effective since matching it fully satisfies the constraint.

Usage

Use ExLlamaV2PrefixFilter when you need to guarantee that generated text starts with a specific string or one of several candidate strings, such as ensuring a function call begins with a known prefix, or forcing a response to start with "Yes" or "No".

Code Reference

Source Location

Repository: Turboderp_org_Exllamav2
File: exllamav2/generator/filters/prefix.py
Lines: L1-91

Signature

class ExLlamaV2PrefixFilter(ExLlamaV2Filter):

    prefix_strings: list[str]
    current_prefixes: set[str]
    current_str: str

    def __init__(self,
                 model: ExLlamaV2,
                 tokenizer: ExLlamaV2Tokenizer,
                 prefix_strings: str | list[str]):
        ...

    def clone(self, c=None) -> ExLlamaV2PrefixFilter:
        ...

    def begin(self, prefix_str: str = "") -> None:
        ...

    def feed(self, token: int) -> None:
        ...

    def next(self) -> tuple[set[int] | None, set]:
        ...

Import

from exllamav2.generator.filters import ExLlamaV2PrefixFilter

I/O Contract

Inputs

Name	Type	Required	Description
model	ExLlamaV2	Yes	The loaded ExLlamaV2 model instance
tokenizer	ExLlamaV2Tokenizer	Yes	The tokenizer associated with the model
prefix_strings	str or list[str]	Yes	One or more prefix strings that generation must start with
prefix_str	str	No (begin, default "")	Context prefix string passed at generation start (not used by this filter)
token	int	Yes (feed)	Token ID selected by the sampler

Outputs

Name	Type	Description
pass_tokens	set[int] or None	From next(): set of allowed token IDs, or None when the prefix constraint is fully satisfied
end_tokens	set	From next(): always an empty set (this filter does not define end-of-constraint tokens)

Usage Examples

Force Response to Start with a Specific Prefix

from exllamav2.generator.filters import ExLlamaV2PrefixFilter
from exllamav2.generator import ExLlamaV2DynamicJob

# Force generation to begin with "def " or "class "
prefix_filter = ExLlamaV2PrefixFilter(
    model, tokenizer,
    prefix_strings=["def ", "class "]
)

job = ExLlamaV2DynamicJob(
    input_ids=input_ids,
    gen_settings=gen_settings,
    max_new_tokens=256,
    filters=[prefix_filter],
)
generator.enqueue(job)

Single Prefix Constraint

from exllamav2.generator.filters import ExLlamaV2PrefixFilter

# Ensure the model starts its response with "Sure, "
prefix_filter = ExLlamaV2PrefixFilter(model, tokenizer, "Sure, ")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment