Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Turboderp org Exllamav2 ExLlamaV2SelectFilter

From Leeroopedia
Knowledge Sources
Domains Filtering, Constrained_Generation
Last Updated 2026-02-15 00:00 GMT

Overview

Token filter that constrains generation to produce exactly one of a predefined set of string options, with optional case-insensitive matching, using trie-based token traversal.

Description

ExLlamaV2SelectFilter is a subclass of ExLlamaV2Filter that forces the model to output one of the specified options verbatim. At each generation step, it computes which tokens could still lead to a valid option and which tokens would complete an option, enabling multiple-choice style constrained generation.

Key components:

  • __init__(model, tokenizer, options, case_insensitive=False) -- Accepts a list of allowed output strings and an optional case_insensitive flag. If case-insensitive, all options are lowercased at init time.
  • clone(c=None) -- Creates a copy preserving options, offset, prefix, case_insensitive, and sequence_str_cmp state.
  • begin(prefix_str) -- Resets sequence_str, sequence_str_cmp, offset to 0, and stores the prefix (which is prepended to each option during matching).
  • feed(token) -- Decodes the token to its string piece, appends it to sequence_str, and updates sequence_str_cmp (the comparison string). In case-insensitive mode, only the portion beyond the prefix is lowercased, preserving exact prefix matching.
  • next() -- For each option, concatenates the prefix with the option string and checks whether the already-generated text (sequence_str_cmp) matches the option prefix up to the current offset. For matching options, traverses the tokenizer's character trie (case-insensitive trie if applicable) to find:
    • pass_tokens -- Token IDs that advance along a valid option path.
    • end_tokens -- Token IDs that would exactly complete a remaining option.

Returns (pass_tokens, end_tokens).

The case-insensitive mode has a special path: when the prefix is cased but the continuation should be case-insensitive, the filter individually verifies each leaf token against the cased option prefix to avoid false matches.

Usage

Use ExLlamaV2SelectFilter when the model must output exactly one of a known set of choices, such as sentiment labels ("positive", "negative", "neutral"), yes/no answers, or classification categories. It guarantees the output will be one of the specified strings.

Code Reference

Source Location

Signature

class ExLlamaV2SelectFilter(ExLlamaV2Filter):

    options: list[str]
    offset: int
    prefix: str
    case_insensitive: bool
    sequence_str_cmp: str

    def __init__(self,
                 model: ExLlamaV2,
                 tokenizer: ExLlamaV2Tokenizer,
                 options: list[str],
                 case_insensitive: bool = False):
        ...

    def clone(self, c=None) -> ExLlamaV2SelectFilter:
        ...

    def begin(self, prefix_str: str = "") -> None:
        ...

    def feed(self, token: int) -> None:
        ...

    def next(self) -> tuple[set[int], set[int]]:
        ...

Import

from exllamav2.generator.filters import ExLlamaV2SelectFilter

I/O Contract

Inputs

Name Type Required Description
model ExLlamaV2 Yes The loaded ExLlamaV2 model instance
tokenizer ExLlamaV2Tokenizer Yes The tokenizer associated with the model
options list[str] Yes List of allowed output strings the model may generate
case_insensitive bool No (default False) If True, match options regardless of letter case
prefix_str str No (begin, default "") Prefix string prepended to each option during matching
token int Yes (feed) Token ID selected by the sampler

Outputs

Name Type Description
pass_tokens set[int] From next(): set of token IDs that advance toward one or more valid options
end_tokens set[int] From next(): set of token IDs that would exactly complete a valid option (triggers EOS)

Usage Examples

Sentiment Classification

from exllamav2.generator.filters import ExLlamaV2SelectFilter
from exllamav2.generator import ExLlamaV2DynamicJob

# Constrain output to one of three sentiment labels
select_filter = ExLlamaV2SelectFilter(
    model, tokenizer,
    options=["positive", "negative", "neutral"]
)

job = ExLlamaV2DynamicJob(
    input_ids=input_ids,
    gen_settings=gen_settings,
    max_new_tokens=10,
    filters=[select_filter],
)
generator.enqueue(job)

Case-Insensitive Yes/No

from exllamav2.generator.filters import ExLlamaV2SelectFilter

# Accept "yes", "Yes", "YES", etc.
yn_filter = ExLlamaV2SelectFilter(
    model, tokenizer,
    options=["yes", "no"],
    case_insensitive=True
)

Related Pages

Implements Principle

Extends

Depends On

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment