Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Turboderp org Exllamav2 ExLlamaV2Sampler Settings

From Leeroopedia
Knowledge Sources
Domains Text_Generation, Sampling, NLP
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for configuring token sampling strategies and parameters that control text generation behavior, provided by exllamav2.

Description

ExLlamaV2Sampler.Settings is a dataclass that encapsulates all sampling parameters used during text generation. It defines the temperature, top-k, top-p, repetition penalties, and numerous other sampling strategies that transform model logits into a token selection.

The settings object is passed to generator methods (generate(), begin_stream_ex()) and controls how each token is sampled. A static greedy() factory method provides a convenient preset for deterministic (argmax) decoding.

Key parameter groups:

  • Core sampling: temperature, top_k, top_p, min_p
  • Advanced sampling: typical, tfs (tail-free), mirostat, smoothing_factor
  • Repetition control: token_repetition_penalty, token_repetition_range, token_frequency_penalty, token_presence_penalty
  • Specialized: DRY (don't repeat yourself), XTC (exclude top choices)
  • Guidance: cfg_scale for classifier-free guidance
  • Bias: token_bias for per-token logit adjustments

Usage

Create a Settings instance and configure the desired parameters before passing it to a generator. Common presets:

  • Greedy: Settings.greedy() for deterministic output
  • Creative: Higher temperature, moderate top-p
  • Balanced: Default settings with minor adjustments

Code Reference

Source Location

  • Repository: exllamav2
  • File: exllamav2/generator/sampler.py
  • Lines: L51-137

Signature

class ExLlamaV2Sampler:

    class Settings:

        # Core sampling
        temperature: float = 0.8
        top_k: int = 50
        top_p: float = 0.8
        min_p: float = 0.0
        tfs: float = 0.0
        typical: float = 0.0
        smoothing_factor: float = 0.0

        # Mirostat
        mirostat: bool = False
        mirostat_tau: float = 1.5
        mirostat_eta: float = 0.1
        mirostat_mu: float | None = None

        # Repetition penalties
        token_repetition_penalty: float = 1.025
        token_repetition_range: int = -1
        token_frequency_penalty: float = 0.0
        token_presence_penalty: float = 0.0

        # DRY (Don't Repeat Yourself)
        dry_multiplier: float = 0.0
        dry_base: float = 1.75
        dry_allowed_length: int = 2
        dry_range: int = 0

        # XTC (Exclude Top Choices)
        xtc_probability: float = 0.0
        xtc_threshold: float = 0.1

        # Guidance
        cfg_scale: float | None = None

        # Token bias
        token_bias: torch.Tensor | None = None

        @staticmethod
        def greedy(**kwargs) -> "ExLlamaV2Sampler.Settings":
            ...

Import

from exllamav2.generator import ExLlamaV2Sampler

I/O Contract

Inputs

Name Type Required Description
temperature float No (default 0.8) Logit scaling factor; higher = more random, lower = more deterministic
top_k int No (default 50) Number of top tokens to consider; 0 = disabled
top_p float No (default 0.8) Nucleus sampling cumulative probability threshold; 0 = disabled
min_p float No (default 0.0) Minimum probability relative to top token; 0 = disabled
tfs float No (default 0.0) Tail-free sampling threshold; 0 = disabled
typical float No (default 0.0) Typical sampling threshold; 0 = disabled
smoothing_factor float No (default 0.0) Quadratic smoothing factor; 0 = disabled
mirostat bool No (default False) Enable Mirostat adaptive sampling
mirostat_tau float No (default 1.5) Mirostat target surprise value
mirostat_eta float No (default 0.1) Mirostat learning rate
token_repetition_penalty float No (default 1.025) Penalty for repeated tokens; 1.0 = disabled
token_repetition_range int No (default -1) How far back to check for repetitions; -1 = full context
token_frequency_penalty float No (default 0.0) Additive penalty scaled by token frequency
token_presence_penalty float No (default 0.0) Additive penalty for any repeated token
dry_multiplier float No (default 0.0) DRY penalty multiplier; 0 = disabled
dry_base float No (default 1.75) DRY penalty base for exponential scaling
dry_allowed_length int No (default 2) DRY minimum sequence length before penalty applies
xtc_probability float No (default 0.0) Probability of applying XTC exclusion; 0 = disabled
xtc_threshold float No (default 0.1) XTC probability threshold for exclusion
cfg_scale float or None No (default None) Classifier-free guidance scale; None = disabled
token_bias torch.Tensor or None No (default None) Per-token logit bias tensor of shape (vocab_size,)

Outputs

Name Type Description
settings instance ExLlamaV2Sampler.Settings Configuration object to pass to generator methods

Usage Examples

Default Settings

from exllamav2.generator import ExLlamaV2Sampler

# Use default settings (temperature=0.8, top_k=50, top_p=0.8)
settings = ExLlamaV2Sampler.Settings()

Greedy (Deterministic) Decoding

# Greedy: always pick the most likely token
settings = ExLlamaV2Sampler.Settings.greedy()
# Equivalent to: top_k=1, top_p=0, temperature=1.0

Creative Writing Settings

settings = ExLlamaV2Sampler.Settings()
settings.temperature = 1.0
settings.top_p = 0.95
settings.top_k = 0         # Disable top-k, rely on top-p
settings.min_p = 0.05
settings.token_repetition_penalty = 1.1

Chat with Mirostat

settings = ExLlamaV2Sampler.Settings()
settings.mirostat = True
settings.mirostat_tau = 3.0    # Target perplexity
settings.mirostat_eta = 0.1    # Learning rate
settings.temperature = 0.8

Anti-Repetition with DRY

settings = ExLlamaV2Sampler.Settings()
settings.temperature = 0.8
settings.top_p = 0.9
settings.dry_multiplier = 0.8
settings.dry_base = 1.75
settings.dry_allowed_length = 2
settings.token_repetition_penalty = 1.05

Token Bias

import torch

settings = ExLlamaV2Sampler.Settings()
settings.temperature = 0.7

# Boost probability of specific tokens
bias = torch.zeros(tokenizer.vocab_size)
bias[tokenizer.eos_token_id] = -10.0  # Discourage EOS
settings.token_bias = bias

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment