Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Sampling

From Leeroopedia
Knowledge Sources
Domains Sampling, Inference
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the high-level sampling pipeline that chains together multiple sampling strategies (top-k, top-p, temperature, penalties, grammar) for token selection during LLM inference.

Description

The common_sampler struct wraps a llama_sampler chain built from configured parameters. common_sampler_init constructs the chain by adding samplers in order (penalties, top-k, typical-p, top-p, min-p, XTC, temperature, distribution sampler). Uses a ring_buffer to track the last N accepted tokens for repeat penalty computation. common_sampler_sample applies the chain to logits and optionally re-samples with grammar constraints if the initially sampled token violates them. Also provides common_sampler_sample_and_accept_n for speculative decoding that cross-references sampled tokens against draft tokens.

Usage

Use this for all token sampling during inference. The sampling pipeline controls text generation quality and behavior through temperature, top-p, repetition penalties, grammar constraints, and other strategies.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/common/sampling.cpp
  • Lines: 1-654

Signature

template<typename T>
struct ring_buffer {
    ring_buffer(size_t cap);
    T & front();
    T & back();
    void push_back(const T & value);
    T pop_front();
    const T & rat(size_t i) const;
    std::vector<T> to_vector() const;
    void clear();
    bool empty() const;
    size_t size() const;
};

struct common_sampler {
    common_params_sampling params;
    struct llama_sampler * chain;
    bool grammar;
    ring_buffer<llama_token> prev;
    std::vector<llama_token_data> cur;
    llama_token_data_array cur_p;
    void reset();
    void set_logits(struct llama_context * ctx, int idx);
};

struct common_sampler * common_sampler_init(const struct llama_model * model,
                                            const struct common_params_sampling & params);

Import

#include "sampling.h"

I/O Contract

Inputs

Name Type Required Description
model const llama_model * Yes Model used to determine vocabulary size
params common_params_sampling Yes Sampling configuration (temp, top_k, top_p, etc.)
ctx llama_context * Yes Context with logits to sample from
idx int Yes Token index in the batch to sample

Outputs

Name Type Description
token llama_token The sampled token
sampler common_sampler * Initialized sampler instance

Usage Examples

#include "sampling.h"

// Initialize sampler
common_params_sampling sparams;
sparams.temp = 0.8f;
sparams.top_p = 0.95f;
auto * smpl = common_sampler_init(model, sparams);

// Sample a token
llama_token token = common_sampler_sample(smpl, ctx, -1);

// Accept the token
common_sampler_accept(smpl, token, true);

// Clean up
common_sampler_free(smpl);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment