Implementation:Turboderp org Exllamav2 ExLlamaV2PrefixFilter
| Knowledge Sources | |
|---|---|
| Domains | Filtering, Constrained_Generation |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Token filter that constrains generation to begin with one of a set of allowed prefix strings, using trie-based matching against the tokenizer vocabulary to efficiently compute allowed tokens at each step.
Description
ExLlamaV2PrefixFilter is a subclass of ExLlamaV2Filter that forces the model's output to start with one of the specified prefix strings. Once the shortest matching prefix has been fully generated, the constraint is released and generation continues freely.
Key components:
- __init__(model, tokenizer, prefix_strings) -- Accepts a single string or list of strings as allowed prefixes. Stores them in prefix_strings and initialises current_prefixes (a set tracking still-viable prefixes) and current_str (the generated text so far).
- clone(c=None) -- Creates a copy of the filter preserving prefix_strings, current_prefixes, and current_str state.
- begin(prefix_str) -- Resets current_prefixes to the full set of all configured prefix strings and clears current_str.
- feed(token) -- Decodes the token to its string piece via tokenizer.get_id_to_piece_list(), appends it to current_str, and prunes any prefix from current_prefixes that no longer matches the generated text.
- next() -- If the generated string already satisfies the shortest remaining prefix (i.e., len(current_str) >= min_valid_length), returns (None, set()) to indicate no constraint. Otherwise, for each remaining prefix, it traverses the tokenizer's character trie to find all token IDs that would advance along the prefix path, and also checks the prefix-to-IDs dictionary for tokens that could complete the remaining string in one step. Returns (pass_tokens_all, set()).
The filter relies on two precomputed tokenizer data structures:
- tokenizer.get_char_trie() -- A character-level trie over all token pieces, where each node stores leaf token IDs.
- tokenizer.get_prefix_to_ids_dict() -- A dictionary mapping string prefixes to sets of token IDs that decode to exactly that prefix.
Note that if two prefix strings share a common prefix (e.g., "story" and "storytime"), only the shorter one is effective since matching it fully satisfies the constraint.
Usage
Use ExLlamaV2PrefixFilter when you need to guarantee that generated text starts with a specific string or one of several candidate strings, such as ensuring a function call begins with a known prefix, or forcing a response to start with "Yes" or "No".
Code Reference
Source Location
- Repository: Turboderp_org_Exllamav2
- File: exllamav2/generator/filters/prefix.py
- Lines: L1-91
Signature
class ExLlamaV2PrefixFilter(ExLlamaV2Filter):
prefix_strings: list[str]
current_prefixes: set[str]
current_str: str
def __init__(self,
model: ExLlamaV2,
tokenizer: ExLlamaV2Tokenizer,
prefix_strings: str | list[str]):
...
def clone(self, c=None) -> ExLlamaV2PrefixFilter:
...
def begin(self, prefix_str: str = "") -> None:
...
def feed(self, token: int) -> None:
...
def next(self) -> tuple[set[int] | None, set]:
...
Import
from exllamav2.generator.filters import ExLlamaV2PrefixFilter
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | ExLlamaV2 | Yes | The loaded ExLlamaV2 model instance |
| tokenizer | ExLlamaV2Tokenizer | Yes | The tokenizer associated with the model |
| prefix_strings | str or list[str] | Yes | One or more prefix strings that generation must start with |
| prefix_str | str | No (begin, default "") | Context prefix string passed at generation start (not used by this filter) |
| token | int | Yes (feed) | Token ID selected by the sampler |
Outputs
| Name | Type | Description |
|---|---|---|
| pass_tokens | set[int] or None | From next(): set of allowed token IDs, or None when the prefix constraint is fully satisfied |
| end_tokens | set | From next(): always an empty set (this filter does not define end-of-constraint tokens) |
Usage Examples
Force Response to Start with a Specific Prefix
from exllamav2.generator.filters import ExLlamaV2PrefixFilter
from exllamav2.generator import ExLlamaV2DynamicJob
# Force generation to begin with "def " or "class "
prefix_filter = ExLlamaV2PrefixFilter(
model, tokenizer,
prefix_strings=["def ", "class "]
)
job = ExLlamaV2DynamicJob(
input_ids=input_ids,
gen_settings=gen_settings,
max_new_tokens=256,
filters=[prefix_filter],
)
generator.enqueue(job)
Single Prefix Constraint
from exllamav2.generator.filters import ExLlamaV2PrefixFilter
# Ensure the model starts its response with "Sure, "
prefix_filter = ExLlamaV2PrefixFilter(model, tokenizer, "Sure, ")