Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ggml org Llama cpp Sampling System

From Leeroopedia
Revision as of 17:12, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Ggml_org_Llama_cpp_Sampling_System.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Knowledge Sources
Domains Sampling
Last Updated 2026-02-15 00:00 GMT

Overview

The Sampling System is the principle of defining the type interfaces and data structures for token sampling and speculative decoding samplers.

Description

This principle covers the header-level type definitions and interfaces that define how token sampling operates in llama.cpp. This includes the sampler chain architecture, individual sampler type interfaces (temperature, top-k, top-p, min-p, repetition penalties, grammar constraints), and the speculative decoding sampler interface that coordinates draft and target model sampling. These headers define the contracts that concrete sampler implementations must follow.

Usage

Apply this principle when implementing new sampling strategies, extending the sampler chain with custom samplers, or integrating speculative decoding with the sampling pipeline.

Theoretical Basis

Token sampling transforms raw logits (unnormalized log-probabilities) from the model into a selected token. The sampling system uses a chain-of-responsibility pattern where multiple samplers are composed in sequence, each modifying the token probability distribution. Common samplers include temperature scaling (controlling randomness), top-k filtering (keeping only the k most likely tokens), top-p (nucleus) sampling (keeping tokens whose cumulative probability exceeds p), min-p sampling (keeping tokens with probability at least min_p times the top token's probability), and repetition penalties (reducing the probability of recently generated tokens). The speculative decoding interface extends sampling to coordinate between a draft model that proposes tokens and a target model that verifies them.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment