Implementation:Ggml org Llama cpp Sampling Header

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Sampling, API
Last Updated	2026-02-15 00:00 GMT

Overview

Declares the common_sampler API that wraps llama_sampler with grammar support, token history, and configurable sampling chains.

Description

Declares functions for sampler lifecycle (init, free, clone, reset), token acceptance, single-token sampling (`common_sampler_sample` with optional grammar-first mode), and batch sampling for speculative decoding (`common_sampler_sample_and_accept_n`). Provides accessors for the underlying llama_sampler chain, candidate token data, last accepted token, seed retrieval, and performance metrics printing. Includes sampler type conversion utilities (to/from strings and chars), the `llama_sampler_init_llg` function for llguidance grammar integration, and RAII ownership via `common_sampler_deleter` and `common_sampler_ptr`.

Usage

Include this header in any application that needs to sample tokens from model logits. It is the public interface for the sampling subsystem used by the server, CLI, and all generation examples to convert model logits into token sequences with quality/diversity controls.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: common/sampling.h
Lines: 1-119

Signature

struct common_sampler * common_sampler_init(const struct llama_model * model, struct common_params_sampling & params);
void                    common_sampler_free(struct common_sampler * gsmpl);
void                    common_sampler_accept(struct common_sampler * gsmpl, llama_token token, bool accept_grammar);
void                    common_sampler_reset (struct common_sampler * gsmpl);
struct common_sampler * common_sampler_clone (struct common_sampler * gsmpl);

llama_token common_sampler_sample(struct common_sampler * gsmpl, struct llama_context * ctx, int idx, bool grammar_first = false);

std::vector<llama_token> common_sampler_sample_and_accept_n(
    struct common_sampler * gsmpl, struct llama_context * ctx,
    const std::vector<int> & idxs, const llama_tokens & draft, bool grammar_first = false);

std::vector<llama_token> common_sampler_sample_and_accept_n(
    struct common_sampler * gsmpl, struct llama_context * ctx,
    const llama_tokens & draft, bool grammar_first = false);

uint32_t common_sampler_get_seed(const struct common_sampler * gsmpl);
struct llama_sampler * common_sampler_get(const struct common_sampler * gsmpl);
llama_token_data_array * common_sampler_get_candidates(struct common_sampler * gsmpl, bool do_sort);
llama_token common_sampler_last(const struct common_sampler * gsmpl);
std::string common_sampler_print(const struct common_sampler * gsmpl);
std::string common_sampler_prev_str(common_sampler * gsmpl, llama_context * ctx, int n);
void common_perf_print(const struct llama_context * ctx, const struct common_sampler * gsmpl);

llama_sampler * llama_sampler_init_llg(const llama_vocab * vocab,
    const char * grammar_kind, const char * grammar_data);

typedef std::unique_ptr<common_sampler, common_sampler_deleter> common_sampler_ptr;

Import

#include "sampling.h"

I/O Contract

Inputs

Name	Type	Required	Description
model	const struct llama_model *	Yes	Model used to initialize the sampler (needed for vocabulary info)
params	struct common_params_sampling &	Yes	Sampling parameters (temperature, top_k, top_p, grammar, etc.)
gsmpl	struct common_sampler *	Yes	Sampler instance for sampling/acceptance operations
ctx	struct llama_context *	Yes	Context with computed logits to sample from
idx	int	Yes	Batch index of the logits to sample
token	llama_token	Yes	Token to accept into the sampler history
grammar_first	bool	No	If true, apply grammar constraints before sampling (slower but stricter)
draft	const llama_tokens &	Yes (batch)	Draft tokens for speculative decoding verification

Outputs

Name	Type	Description
sampled token	llama_token	The token selected by the sampling chain
accepted tokens	std::vector<llama_token>	Tokens accepted during speculative decoding batch sampling
seed	uint32_t	Current random seed of the sampler
candidates	llama_token_data_array *	Internal candidate token array with probabilities
last token	llama_token	The most recently accepted token
description	std::string	Human-readable description of the sampler chain

Usage Examples

#include "sampling.h"

// Initialize a sampler from model and parameters
common_params_sampling sparams;
sparams.temp = 0.8f;
sparams.top_k = 40;
sparams.top_p = 0.95f;
struct common_sampler * smpl = common_sampler_init(model, sparams);

// Sample a single token
llama_token token = common_sampler_sample(smpl, ctx, 0);
common_sampler_accept(smpl, token, true);

// Batch sampling with speculative decoding
llama_tokens draft = {/* draft tokens */};
auto accepted = common_sampler_sample_and_accept_n(smpl, ctx, draft);

// RAII ownership
common_sampler_ptr smpl_ptr(common_sampler_init(model, sparams));

// Cleanup
common_sampler_free(smpl);

Related Pages

Principle:Ggml_org_Llama_cpp_Sampling

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment