Implementation:Ollama Ollama Llama Sampling API

Knowledge Sources	Ollama
Domains	Sampling, Inference
Last Updated	2025-02-15 00:00 GMT

Overview

Header declaring the common_sampler API, which extends llama's base sampler with grammar support, token history tracking, and performance metrics.

Description

Declares common_sampler_init for constructing a sampler from model and parameters, lifecycle functions (free, reset, clone, accept), the primary common_sampler_sample for single-token sampling, and common_sampler_sample_and_accept_n for speculative decoding with draft token verification. Also provides accessors for the underlying llama_sampler, performance printing, sampler type conversion utilities, and a common_sampler_deleter for use with std::unique_ptr.

Usage

Include this header to access the high-level sampling system used by all llama.cpp examples and Ollama's Go bindings. It abstracts the complexity of chaining multiple sampling strategies into a single API.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/common/sampling.h
Lines: 1-114

Signature

struct common_sampler * common_sampler_init(const struct llama_model * model,
                                            const struct common_params_sampling & params);
void common_sampler_free(struct common_sampler * gsmpl);
void common_sampler_accept(struct common_sampler * gsmpl, llama_token token, bool accept_grammar);
void common_sampler_reset (struct common_sampler * gsmpl);
struct common_sampler * common_sampler_clone(struct common_sampler * gsmpl);

llama_token common_sampler_sample(struct common_sampler * gsmpl, struct llama_context * ctx, int idx);

std::vector<llama_token> common_sampler_sample_and_accept_n(
    struct common_sampler * gsmpl, struct llama_context * ctx,
    const std::vector<int> & idxs, const llama_tokens & draft);

uint32_t common_sampler_get_seed(const struct common_sampler * gsmpl);
llama_token common_sampler_last(const struct common_sampler * gsmpl);
std::string common_sampler_print(const struct common_sampler * gsmpl);

typedef std::unique_ptr<common_sampler, common_sampler_deleter> common_sampler_ptr;

Import

#include "sampling.h"

I/O Contract

Inputs

Name	Type	Required	Description
model	const llama_model *	Yes	Model for vocabulary information
params	common_params_sampling	Yes	Sampling configuration parameters
ctx	llama_context *	Yes	Inference context with logits
idx	int	Yes	Batch index to sample from

Outputs

Name	Type	Description
token	llama_token	Sampled token ID
tokens	std::vector<llama_token>	Accepted tokens (for speculative decoding)

Usage Examples

#include "sampling.h"

// Create sampler with RAII
common_sampler_ptr smpl(common_sampler_init(model, sparams));

// Sample and accept
llama_token tok = common_sampler_sample(smpl.get(), ctx, -1);
common_sampler_accept(smpl.get(), tok, true);

// Speculative decoding
auto accepted = common_sampler_sample_and_accept_n(smpl.get(), ctx, idxs, draft_tokens);

Related Pages

Principle:Ollama_Ollama_Sampling_Pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment