Implementation:Turboderp org Exllamav2 ExLlamaV2SelectFilter

Knowledge Sources	Turboderp_org_Exllamav2
Domains	Filtering, Constrained_Generation
Last Updated	2026-02-15 00:00 GMT

Overview

Token filter that constrains generation to produce exactly one of a predefined set of string options, with optional case-insensitive matching, using trie-based token traversal.

Description

ExLlamaV2SelectFilter is a subclass of ExLlamaV2Filter that forces the model to output one of the specified options verbatim. At each generation step, it computes which tokens could still lead to a valid option and which tokens would complete an option, enabling multiple-choice style constrained generation.

Key components:

__init__(model, tokenizer, options, case_insensitive=False) -- Accepts a list of allowed output strings and an optional case_insensitive flag. If case-insensitive, all options are lowercased at init time.
clone(c=None) -- Creates a copy preserving options, offset, prefix, case_insensitive, and sequence_str_cmp state.
begin(prefix_str) -- Resets sequence_str, sequence_str_cmp, offset to 0, and stores the prefix (which is prepended to each option during matching).
feed(token) -- Decodes the token to its string piece, appends it to sequence_str, and updates sequence_str_cmp (the comparison string). In case-insensitive mode, only the portion beyond the prefix is lowercased, preserving exact prefix matching.
next() -- For each option, concatenates the prefix with the option string and checks whether the already-generated text (sequence_str_cmp) matches the option prefix up to the current offset. For matching options, traverses the tokenizer's character trie (case-insensitive trie if applicable) to find:
- pass_tokens -- Token IDs that advance along a valid option path.
- end_tokens -- Token IDs that would exactly complete a remaining option.

Returns (pass_tokens, end_tokens).

The case-insensitive mode has a special path: when the prefix is cased but the continuation should be case-insensitive, the filter individually verifies each leaf token against the cased option prefix to avoid false matches.

Usage

Use ExLlamaV2SelectFilter when the model must output exactly one of a known set of choices, such as sentiment labels ("positive", "negative", "neutral"), yes/no answers, or classification categories. It guarantees the output will be one of the specified strings.

Code Reference

Source Location

Repository: Turboderp_org_Exllamav2
File: exllamav2/generator/filters/select.py
Lines: L1-130 (131 with trailing newline)

Signature

class ExLlamaV2SelectFilter(ExLlamaV2Filter):

    options: list[str]
    offset: int
    prefix: str
    case_insensitive: bool
    sequence_str_cmp: str

    def __init__(self,
                 model: ExLlamaV2,
                 tokenizer: ExLlamaV2Tokenizer,
                 options: list[str],
                 case_insensitive: bool = False):
        ...

    def clone(self, c=None) -> ExLlamaV2SelectFilter:
        ...

    def begin(self, prefix_str: str = "") -> None:
        ...

    def feed(self, token: int) -> None:
        ...

    def next(self) -> tuple[set[int], set[int]]:
        ...

Import

from exllamav2.generator.filters import ExLlamaV2SelectFilter

I/O Contract

Inputs

Name	Type	Required	Description
model	ExLlamaV2	Yes	The loaded ExLlamaV2 model instance
tokenizer	ExLlamaV2Tokenizer	Yes	The tokenizer associated with the model
options	list[str]	Yes	List of allowed output strings the model may generate
case_insensitive	bool	No (default False)	If True, match options regardless of letter case
prefix_str	str	No (begin, default "")	Prefix string prepended to each option during matching
token	int	Yes (feed)	Token ID selected by the sampler

Outputs

Name	Type	Description
pass_tokens	set[int]	From next(): set of token IDs that advance toward one or more valid options
end_tokens	set[int]	From next(): set of token IDs that would exactly complete a valid option (triggers EOS)

Usage Examples

Sentiment Classification

from exllamav2.generator.filters import ExLlamaV2SelectFilter
from exllamav2.generator import ExLlamaV2DynamicJob

# Constrain output to one of three sentiment labels
select_filter = ExLlamaV2SelectFilter(
    model, tokenizer,
    options=["positive", "negative", "neutral"]
)

job = ExLlamaV2DynamicJob(
    input_ids=input_ids,
    gen_settings=gen_settings,
    max_new_tokens=10,
    filters=[select_filter],
)
generator.enqueue(job)

Case-Insensitive Yes/No

from exllamav2.generator.filters import ExLlamaV2SelectFilter

# Accept "yes", "Yes", "YES", etc.
yn_filter = ExLlamaV2SelectFilter(
    model, tokenizer,
    options=["yes", "no"],
    case_insensitive=True
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment