Implementation:Turboderp org Exllamav2 ExLlamaV2Sampler Settings
| Knowledge Sources | |
|---|---|
| Domains | Text_Generation, Sampling, NLP |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for configuring token sampling strategies and parameters that control text generation behavior, provided by exllamav2.
Description
ExLlamaV2Sampler.Settings is a dataclass that encapsulates all sampling parameters used during text generation. It defines the temperature, top-k, top-p, repetition penalties, and numerous other sampling strategies that transform model logits into a token selection.
The settings object is passed to generator methods (generate(), begin_stream_ex()) and controls how each token is sampled. A static greedy() factory method provides a convenient preset for deterministic (argmax) decoding.
Key parameter groups:
- Core sampling: temperature, top_k, top_p, min_p
- Advanced sampling: typical, tfs (tail-free), mirostat, smoothing_factor
- Repetition control: token_repetition_penalty, token_repetition_range, token_frequency_penalty, token_presence_penalty
- Specialized: DRY (don't repeat yourself), XTC (exclude top choices)
- Guidance: cfg_scale for classifier-free guidance
- Bias: token_bias for per-token logit adjustments
Usage
Create a Settings instance and configure the desired parameters before passing it to a generator. Common presets:
- Greedy: Settings.greedy() for deterministic output
- Creative: Higher temperature, moderate top-p
- Balanced: Default settings with minor adjustments
Code Reference
Source Location
- Repository: exllamav2
- File: exllamav2/generator/sampler.py
- Lines: L51-137
Signature
class ExLlamaV2Sampler:
class Settings:
# Core sampling
temperature: float = 0.8
top_k: int = 50
top_p: float = 0.8
min_p: float = 0.0
tfs: float = 0.0
typical: float = 0.0
smoothing_factor: float = 0.0
# Mirostat
mirostat: bool = False
mirostat_tau: float = 1.5
mirostat_eta: float = 0.1
mirostat_mu: float | None = None
# Repetition penalties
token_repetition_penalty: float = 1.025
token_repetition_range: int = -1
token_frequency_penalty: float = 0.0
token_presence_penalty: float = 0.0
# DRY (Don't Repeat Yourself)
dry_multiplier: float = 0.0
dry_base: float = 1.75
dry_allowed_length: int = 2
dry_range: int = 0
# XTC (Exclude Top Choices)
xtc_probability: float = 0.0
xtc_threshold: float = 0.1
# Guidance
cfg_scale: float | None = None
# Token bias
token_bias: torch.Tensor | None = None
@staticmethod
def greedy(**kwargs) -> "ExLlamaV2Sampler.Settings":
...
Import
from exllamav2.generator import ExLlamaV2Sampler
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| temperature | float | No (default 0.8) | Logit scaling factor; higher = more random, lower = more deterministic |
| top_k | int | No (default 50) | Number of top tokens to consider; 0 = disabled |
| top_p | float | No (default 0.8) | Nucleus sampling cumulative probability threshold; 0 = disabled |
| min_p | float | No (default 0.0) | Minimum probability relative to top token; 0 = disabled |
| tfs | float | No (default 0.0) | Tail-free sampling threshold; 0 = disabled |
| typical | float | No (default 0.0) | Typical sampling threshold; 0 = disabled |
| smoothing_factor | float | No (default 0.0) | Quadratic smoothing factor; 0 = disabled |
| mirostat | bool | No (default False) | Enable Mirostat adaptive sampling |
| mirostat_tau | float | No (default 1.5) | Mirostat target surprise value |
| mirostat_eta | float | No (default 0.1) | Mirostat learning rate |
| token_repetition_penalty | float | No (default 1.025) | Penalty for repeated tokens; 1.0 = disabled |
| token_repetition_range | int | No (default -1) | How far back to check for repetitions; -1 = full context |
| token_frequency_penalty | float | No (default 0.0) | Additive penalty scaled by token frequency |
| token_presence_penalty | float | No (default 0.0) | Additive penalty for any repeated token |
| dry_multiplier | float | No (default 0.0) | DRY penalty multiplier; 0 = disabled |
| dry_base | float | No (default 1.75) | DRY penalty base for exponential scaling |
| dry_allowed_length | int | No (default 2) | DRY minimum sequence length before penalty applies |
| xtc_probability | float | No (default 0.0) | Probability of applying XTC exclusion; 0 = disabled |
| xtc_threshold | float | No (default 0.1) | XTC probability threshold for exclusion |
| cfg_scale | float or None | No (default None) | Classifier-free guidance scale; None = disabled |
| token_bias | torch.Tensor or None | No (default None) | Per-token logit bias tensor of shape (vocab_size,) |
Outputs
| Name | Type | Description |
|---|---|---|
| settings instance | ExLlamaV2Sampler.Settings | Configuration object to pass to generator methods |
Usage Examples
Default Settings
from exllamav2.generator import ExLlamaV2Sampler
# Use default settings (temperature=0.8, top_k=50, top_p=0.8)
settings = ExLlamaV2Sampler.Settings()
Greedy (Deterministic) Decoding
# Greedy: always pick the most likely token
settings = ExLlamaV2Sampler.Settings.greedy()
# Equivalent to: top_k=1, top_p=0, temperature=1.0
Creative Writing Settings
settings = ExLlamaV2Sampler.Settings()
settings.temperature = 1.0
settings.top_p = 0.95
settings.top_k = 0 # Disable top-k, rely on top-p
settings.min_p = 0.05
settings.token_repetition_penalty = 1.1
Chat with Mirostat
settings = ExLlamaV2Sampler.Settings()
settings.mirostat = True
settings.mirostat_tau = 3.0 # Target perplexity
settings.mirostat_eta = 0.1 # Learning rate
settings.temperature = 0.8
Anti-Repetition with DRY
settings = ExLlamaV2Sampler.Settings()
settings.temperature = 0.8
settings.top_p = 0.9
settings.dry_multiplier = 0.8
settings.dry_base = 1.75
settings.dry_allowed_length = 2
settings.token_repetition_penalty = 1.05
Token Bias
import torch
settings = ExLlamaV2Sampler.Settings()
settings.temperature = 0.7
# Boost probability of specific tokens
bias = torch.zeros(tokenizer.vocab_size)
bias[tokenizer.eos_token_id] = -10.0 # Discourage EOS
settings.token_bias = bias