Implementation:Ollama Ollama Llama Sampling Engine
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Sampling |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the low-level sampler primitives that form the building blocks of the token sampling pipeline, each implementing the llama_sampler_i vtable interface.
Description
Implements individual sampler types as structs with vtable-style interfaces: llama_sampler_chain for chaining multiple samplers, llama_sampler_dist for random distribution sampling, llama_sampler_top_k for top-k filtering, llama_sampler_top_p for nucleus sampling, llama_sampler_min_p for minimum probability filtering, llama_sampler_typical for typical sampling, llama_sampler_temp/llama_sampler_temp_ext for temperature scaling, llama_sampler_mirostat/llama_sampler_mirostat_v2 for perplexity-targeting, llama_sampler_grammar for grammar-constrained sampling, llama_sampler_penalties for repetition/presence/frequency penalties, llama_sampler_dry for DRY repetition penalty, and llama_sampler_infill for fill-in-the-middle.
Usage
These are the fundamental sampling operations that control diversity, quality, and constraint adherence of generated text. They are composed together by the higher-level common_sampler in the common library.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/llama-sampling.cpp - Lines: 1-2682
Signature
template<typename T>
struct ring_buffer {
ring_buffer(size_t cap);
T & front();
void push_back(const T & value);
T pop_front();
const T & rat(size_t i) const;
size_t size() const;
};
// Each sampler follows this pattern:
struct llama_sampler_top_k {
const int32_t k;
};
static struct llama_sampler_i llama_sampler_top_k_i = {
.name = [](const struct llama_sampler *) { return "top-k"; },
.apply = [](struct llama_sampler * smpl, llama_token_data_array * cur_p) { ... },
// ...
};
struct llama_sampler * llama_sampler_init_top_k(int32_t k);
struct llama_sampler * llama_sampler_init_top_p(float p, size_t min_keep);
struct llama_sampler * llama_sampler_init_min_p(float p, size_t min_keep);
struct llama_sampler * llama_sampler_init_temp(float t);
struct llama_sampler * llama_sampler_init_dist(uint32_t seed);
struct llama_sampler * llama_sampler_init_mirostat_v2(float tau, float eta, uint32_t seed);
struct llama_sampler * llama_sampler_init_grammar(const char * grammar_str, const char * grammar_root);
struct llama_sampler * llama_sampler_init_penalties(struct llama_vocab * vocab, ...);
Import
#include "llama-sampling.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| cur_p | llama_token_data_array * | Yes | Candidate token array with logits |
| k | int32_t | No | Top-k count for top_k sampler |
| p | float | No | Probability threshold for top_p/min_p |
| t | float | No | Temperature for temperature sampler |
| seed | uint32_t | No | Random seed for distribution sampler |
Outputs
| Name | Type | Description |
|---|---|---|
| cur_p (modified) | llama_token_data_array * | Filtered/modified candidate array |
| selected token | llama_token | The sampled token (for dist sampler) |
Usage Examples
// Create a sampler chain:
auto * chain = llama_sampler_chain_init(llama_sampler_chain_default_params());
llama_sampler_chain_add(chain, llama_sampler_init_top_k(40));
llama_sampler_chain_add(chain, llama_sampler_init_top_p(0.95, 1));
llama_sampler_chain_add(chain, llama_sampler_init_temp(0.8));
llama_sampler_chain_add(chain, llama_sampler_init_dist(42));
// Apply to candidates:
llama_sampler_apply(chain, &candidates);
llama_token token = candidates.data[candidates.selected].id;