Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Sampling API

From Leeroopedia
Knowledge Sources
Domains Sampling, Inference
Last Updated 2025-02-15 00:00 GMT

Overview

Header declaring the common_sampler API, which extends llama's base sampler with grammar support, token history tracking, and performance metrics.

Description

Declares common_sampler_init for constructing a sampler from model and parameters, lifecycle functions (free, reset, clone, accept), the primary common_sampler_sample for single-token sampling, and common_sampler_sample_and_accept_n for speculative decoding with draft token verification. Also provides accessors for the underlying llama_sampler, performance printing, sampler type conversion utilities, and a common_sampler_deleter for use with std::unique_ptr.

Usage

Include this header to access the high-level sampling system used by all llama.cpp examples and Ollama's Go bindings. It abstracts the complexity of chaining multiple sampling strategies into a single API.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/common/sampling.h
  • Lines: 1-114

Signature

struct common_sampler * common_sampler_init(const struct llama_model * model,
                                            const struct common_params_sampling & params);
void common_sampler_free(struct common_sampler * gsmpl);
void common_sampler_accept(struct common_sampler * gsmpl, llama_token token, bool accept_grammar);
void common_sampler_reset (struct common_sampler * gsmpl);
struct common_sampler * common_sampler_clone(struct common_sampler * gsmpl);

llama_token common_sampler_sample(struct common_sampler * gsmpl, struct llama_context * ctx, int idx);

std::vector<llama_token> common_sampler_sample_and_accept_n(
    struct common_sampler * gsmpl, struct llama_context * ctx,
    const std::vector<int> & idxs, const llama_tokens & draft);

uint32_t common_sampler_get_seed(const struct common_sampler * gsmpl);
llama_token common_sampler_last(const struct common_sampler * gsmpl);
std::string common_sampler_print(const struct common_sampler * gsmpl);

typedef std::unique_ptr<common_sampler, common_sampler_deleter> common_sampler_ptr;

Import

#include "sampling.h"

I/O Contract

Inputs

Name Type Required Description
model const llama_model * Yes Model for vocabulary information
params common_params_sampling Yes Sampling configuration parameters
ctx llama_context * Yes Inference context with logits
idx int Yes Batch index to sample from

Outputs

Name Type Description
token llama_token Sampled token ID
tokens std::vector<llama_token> Accepted tokens (for speculative decoding)

Usage Examples

#include "sampling.h"

// Create sampler with RAII
common_sampler_ptr smpl(common_sampler_init(model, sparams));

// Sample and accept
llama_token tok = common_sampler_sample(smpl.get(), ctx, -1);
common_sampler_accept(smpl.get(), tok, true);

// Speculative decoding
auto accepted = common_sampler_sample_and_accept_n(smpl.get(), ctx, idxs, draft_tokens);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment