Implementation:Ggml org Llama cpp Sampling Header
| Knowledge Sources | |
|---|---|
| Domains | Sampling, API |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares the common_sampler API that wraps llama_sampler with grammar support, token history, and configurable sampling chains.
Description
Declares functions for sampler lifecycle (init, free, clone, reset), token acceptance, single-token sampling (`common_sampler_sample` with optional grammar-first mode), and batch sampling for speculative decoding (`common_sampler_sample_and_accept_n`). Provides accessors for the underlying llama_sampler chain, candidate token data, last accepted token, seed retrieval, and performance metrics printing. Includes sampler type conversion utilities (to/from strings and chars), the `llama_sampler_init_llg` function for llguidance grammar integration, and RAII ownership via `common_sampler_deleter` and `common_sampler_ptr`.
Usage
Include this header in any application that needs to sample tokens from model logits. It is the public interface for the sampling subsystem used by the server, CLI, and all generation examples to convert model logits into token sequences with quality/diversity controls.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: common/sampling.h
- Lines: 1-119
Signature
struct common_sampler * common_sampler_init(const struct llama_model * model, struct common_params_sampling & params);
void common_sampler_free(struct common_sampler * gsmpl);
void common_sampler_accept(struct common_sampler * gsmpl, llama_token token, bool accept_grammar);
void common_sampler_reset (struct common_sampler * gsmpl);
struct common_sampler * common_sampler_clone (struct common_sampler * gsmpl);
llama_token common_sampler_sample(struct common_sampler * gsmpl, struct llama_context * ctx, int idx, bool grammar_first = false);
std::vector<llama_token> common_sampler_sample_and_accept_n(
struct common_sampler * gsmpl, struct llama_context * ctx,
const std::vector<int> & idxs, const llama_tokens & draft, bool grammar_first = false);
std::vector<llama_token> common_sampler_sample_and_accept_n(
struct common_sampler * gsmpl, struct llama_context * ctx,
const llama_tokens & draft, bool grammar_first = false);
uint32_t common_sampler_get_seed(const struct common_sampler * gsmpl);
struct llama_sampler * common_sampler_get(const struct common_sampler * gsmpl);
llama_token_data_array * common_sampler_get_candidates(struct common_sampler * gsmpl, bool do_sort);
llama_token common_sampler_last(const struct common_sampler * gsmpl);
std::string common_sampler_print(const struct common_sampler * gsmpl);
std::string common_sampler_prev_str(common_sampler * gsmpl, llama_context * ctx, int n);
void common_perf_print(const struct llama_context * ctx, const struct common_sampler * gsmpl);
llama_sampler * llama_sampler_init_llg(const llama_vocab * vocab,
const char * grammar_kind, const char * grammar_data);
typedef std::unique_ptr<common_sampler, common_sampler_deleter> common_sampler_ptr;
Import
#include "sampling.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | const struct llama_model * | Yes | Model used to initialize the sampler (needed for vocabulary info) |
| params | struct common_params_sampling & | Yes | Sampling parameters (temperature, top_k, top_p, grammar, etc.) |
| gsmpl | struct common_sampler * | Yes | Sampler instance for sampling/acceptance operations |
| ctx | struct llama_context * | Yes | Context with computed logits to sample from |
| idx | int | Yes | Batch index of the logits to sample |
| token | llama_token | Yes | Token to accept into the sampler history |
| grammar_first | bool | No | If true, apply grammar constraints before sampling (slower but stricter) |
| draft | const llama_tokens & | Yes (batch) | Draft tokens for speculative decoding verification |
Outputs
| Name | Type | Description |
|---|---|---|
| sampled token | llama_token | The token selected by the sampling chain |
| accepted tokens | std::vector<llama_token> | Tokens accepted during speculative decoding batch sampling |
| seed | uint32_t | Current random seed of the sampler |
| candidates | llama_token_data_array * | Internal candidate token array with probabilities |
| last token | llama_token | The most recently accepted token |
| description | std::string | Human-readable description of the sampler chain |
Usage Examples
#include "sampling.h"
// Initialize a sampler from model and parameters
common_params_sampling sparams;
sparams.temp = 0.8f;
sparams.top_k = 40;
sparams.top_p = 0.95f;
struct common_sampler * smpl = common_sampler_init(model, sparams);
// Sample a single token
llama_token token = common_sampler_sample(smpl, ctx, 0);
common_sampler_accept(smpl, token, true);
// Batch sampling with speculative decoding
llama_tokens draft = {/* draft tokens */};
auto accepted = common_sampler_sample_and_accept_n(smpl, ctx, draft);
// RAII ownership
common_sampler_ptr smpl_ptr(common_sampler_init(model, sparams));
// Cleanup
common_sampler_free(smpl);