Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Llama cpp Sampling Header

From Leeroopedia
Revision as of 12:41, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Ggml_org_Llama_cpp_Sampling_Header.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Sampling, API
Last Updated 2026-02-15 00:00 GMT

Overview

Declares the common_sampler API that wraps llama_sampler with grammar support, token history, and configurable sampling chains.

Description

Declares functions for sampler lifecycle (init, free, clone, reset), token acceptance, single-token sampling (`common_sampler_sample` with optional grammar-first mode), and batch sampling for speculative decoding (`common_sampler_sample_and_accept_n`). Provides accessors for the underlying llama_sampler chain, candidate token data, last accepted token, seed retrieval, and performance metrics printing. Includes sampler type conversion utilities (to/from strings and chars), the `llama_sampler_init_llg` function for llguidance grammar integration, and RAII ownership via `common_sampler_deleter` and `common_sampler_ptr`.

Usage

Include this header in any application that needs to sample tokens from model logits. It is the public interface for the sampling subsystem used by the server, CLI, and all generation examples to convert model logits into token sequences with quality/diversity controls.

Code Reference

Source Location

Signature

struct common_sampler * common_sampler_init(const struct llama_model * model, struct common_params_sampling & params);
void                    common_sampler_free(struct common_sampler * gsmpl);
void                    common_sampler_accept(struct common_sampler * gsmpl, llama_token token, bool accept_grammar);
void                    common_sampler_reset (struct common_sampler * gsmpl);
struct common_sampler * common_sampler_clone (struct common_sampler * gsmpl);

llama_token common_sampler_sample(struct common_sampler * gsmpl, struct llama_context * ctx, int idx, bool grammar_first = false);

std::vector<llama_token> common_sampler_sample_and_accept_n(
    struct common_sampler * gsmpl, struct llama_context * ctx,
    const std::vector<int> & idxs, const llama_tokens & draft, bool grammar_first = false);

std::vector<llama_token> common_sampler_sample_and_accept_n(
    struct common_sampler * gsmpl, struct llama_context * ctx,
    const llama_tokens & draft, bool grammar_first = false);

uint32_t common_sampler_get_seed(const struct common_sampler * gsmpl);
struct llama_sampler * common_sampler_get(const struct common_sampler * gsmpl);
llama_token_data_array * common_sampler_get_candidates(struct common_sampler * gsmpl, bool do_sort);
llama_token common_sampler_last(const struct common_sampler * gsmpl);
std::string common_sampler_print(const struct common_sampler * gsmpl);
std::string common_sampler_prev_str(common_sampler * gsmpl, llama_context * ctx, int n);
void common_perf_print(const struct llama_context * ctx, const struct common_sampler * gsmpl);

llama_sampler * llama_sampler_init_llg(const llama_vocab * vocab,
    const char * grammar_kind, const char * grammar_data);

typedef std::unique_ptr<common_sampler, common_sampler_deleter> common_sampler_ptr;

Import

#include "sampling.h"

I/O Contract

Inputs

Name Type Required Description
model const struct llama_model * Yes Model used to initialize the sampler (needed for vocabulary info)
params struct common_params_sampling & Yes Sampling parameters (temperature, top_k, top_p, grammar, etc.)
gsmpl struct common_sampler * Yes Sampler instance for sampling/acceptance operations
ctx struct llama_context * Yes Context with computed logits to sample from
idx int Yes Batch index of the logits to sample
token llama_token Yes Token to accept into the sampler history
grammar_first bool No If true, apply grammar constraints before sampling (slower but stricter)
draft const llama_tokens & Yes (batch) Draft tokens for speculative decoding verification

Outputs

Name Type Description
sampled token llama_token The token selected by the sampling chain
accepted tokens std::vector<llama_token> Tokens accepted during speculative decoding batch sampling
seed uint32_t Current random seed of the sampler
candidates llama_token_data_array * Internal candidate token array with probabilities
last token llama_token The most recently accepted token
description std::string Human-readable description of the sampler chain

Usage Examples

#include "sampling.h"

// Initialize a sampler from model and parameters
common_params_sampling sparams;
sparams.temp = 0.8f;
sparams.top_k = 40;
sparams.top_p = 0.95f;
struct common_sampler * smpl = common_sampler_init(model, sparams);

// Sample a single token
llama_token token = common_sampler_sample(smpl, ctx, 0);
common_sampler_accept(smpl, token, true);

// Batch sampling with speculative decoding
llama_tokens draft = {/* draft tokens */};
auto accepted = common_sampler_sample_and_accept_n(smpl, ctx, draft);

// RAII ownership
common_sampler_ptr smpl_ptr(common_sampler_init(model, sparams));

// Cleanup
common_sampler_free(smpl);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment