Implementation:Ollama Ollama Llama Sampling API
| Knowledge Sources | |
|---|---|
| Domains | Sampling, Inference |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Header declaring the common_sampler API, which extends llama's base sampler with grammar support, token history tracking, and performance metrics.
Description
Declares common_sampler_init for constructing a sampler from model and parameters, lifecycle functions (free, reset, clone, accept), the primary common_sampler_sample for single-token sampling, and common_sampler_sample_and_accept_n for speculative decoding with draft token verification. Also provides accessors for the underlying llama_sampler, performance printing, sampler type conversion utilities, and a common_sampler_deleter for use with std::unique_ptr.
Usage
Include this header to access the high-level sampling system used by all llama.cpp examples and Ollama's Go bindings. It abstracts the complexity of chaining multiple sampling strategies into a single API.
Code Reference
Source Location
- Repository: Ollama
- File: llama/llama.cpp/common/sampling.h
- Lines: 1-114
Signature
struct common_sampler * common_sampler_init(const struct llama_model * model,
const struct common_params_sampling & params);
void common_sampler_free(struct common_sampler * gsmpl);
void common_sampler_accept(struct common_sampler * gsmpl, llama_token token, bool accept_grammar);
void common_sampler_reset (struct common_sampler * gsmpl);
struct common_sampler * common_sampler_clone(struct common_sampler * gsmpl);
llama_token common_sampler_sample(struct common_sampler * gsmpl, struct llama_context * ctx, int idx);
std::vector<llama_token> common_sampler_sample_and_accept_n(
struct common_sampler * gsmpl, struct llama_context * ctx,
const std::vector<int> & idxs, const llama_tokens & draft);
uint32_t common_sampler_get_seed(const struct common_sampler * gsmpl);
llama_token common_sampler_last(const struct common_sampler * gsmpl);
std::string common_sampler_print(const struct common_sampler * gsmpl);
typedef std::unique_ptr<common_sampler, common_sampler_deleter> common_sampler_ptr;
Import
#include "sampling.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | const llama_model * | Yes | Model for vocabulary information |
| params | common_params_sampling | Yes | Sampling configuration parameters |
| ctx | llama_context * | Yes | Inference context with logits |
| idx | int | Yes | Batch index to sample from |
Outputs
| Name | Type | Description |
|---|---|---|
| token | llama_token | Sampled token ID |
| tokens | std::vector<llama_token> | Accepted tokens (for speculative decoding) |
Usage Examples
#include "sampling.h"
// Create sampler with RAII
common_sampler_ptr smpl(common_sampler_init(model, sparams));
// Sample and accept
llama_token tok = common_sampler_sample(smpl.get(), ctx, -1);
common_sampler_accept(smpl.get(), tok, true);
// Speculative decoding
auto accepted = common_sampler_sample_and_accept_n(smpl.get(), ctx, idxs, draft_tokens);