Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Vocab Header

From Leeroopedia
Knowledge Sources
Domains Tokenization, Vocabulary
Last Updated 2026-02-15 00:00 GMT

Overview

Declares the `llama_vocab` struct and the `llama_vocab_pre_type` enum, defining the public interface for vocabulary and tokenization operations.

Description

This header defines the vocabulary system used throughout llama.cpp. The `llama_vocab_pre_type` enum lists 45+ pre-tokenization strategies covering models such as LLaMA3, DeepSeek, GPT-2, Falcon, Qwen, ChatGLM, and many others. The `llama_vocab` struct provides methods for loading vocabulary from GGUF files, querying token properties (type checking, special token getters for BOS/EOS/FIM/etc.), tokenization and detokenization, BPE merge lookups, and vocabulary configuration flags (add_bos, add_eos, clean_spaces, etc.). It uses the pimpl idiom with a private `impl` struct for encapsulation.

Usage

Use this header when performing tokenization, detokenization, or querying token properties in any component that works with text-to-token or token-to-text conversion. It is the core vocabulary interface used by the model loader, context, and sampler.

Code Reference

Source Location

Signature

enum llama_vocab_pre_type {
    LLAMA_VOCAB_PRE_TYPE_DEFAULT = 0,
    LLAMA_VOCAB_PRE_TYPE_LLAMA3 = 1,
    LLAMA_VOCAB_PRE_TYPE_DEEPSEEK_LLM = 2,
    // ... 45+ pre-tokenization types
};

struct llama_vocab {
    struct token_data {
        std::string      text;
        float            score;
        llama_token_attr attr;
    };

    llama_vocab();
    ~llama_vocab();

    void load(llama_model_loader & ml, const LLM_KV & kv);

    enum llama_vocab_type     get_type()     const;
    enum llama_vocab_pre_type get_pre_type() const;

    uint32_t n_tokens() const;

    int32_t tokenize(const char * text, int32_t text_len, llama_token * tokens, int32_t n_tokens_max, bool add_special, bool parse_special) const;
    int32_t token_to_piece(llama_token token, char * buf, int32_t length, int32_t lstrip, bool special) const;
    int32_t detokenize(const llama_token * tokens, int32_t n_tokens, char * text, int32_t text_len_max, bool remove_special, bool unparse_special) const;

    llama_token token_bos() const;
    llama_token token_eos() const;
    // ... additional special token accessors
};

Import

#include "llama-vocab.h"

I/O Contract

Inputs

Name Type Required Description
ml llama_model_loader & Yes Model loader to read vocabulary data from GGUF file
kv const LLM_KV & Yes Key-value mapping for vocabulary metadata keys
text const char * Yes Input text for tokenization
text_len int32_t Yes Length of the input text
tokens llama_token * Yes Output buffer for tokenized result
n_tokens_max int32_t Yes Maximum number of tokens to produce
add_special bool Yes Whether to add special tokens (BOS/EOS)
parse_special bool Yes Whether to parse special token markup in text

Outputs

Name Type Description
tokenize (return) int32_t Number of tokens produced, or negative if buffer too small
token_to_piece (return) int32_t Number of characters written to the output buffer
detokenize (return) int32_t Number of characters written during detokenization

Usage Examples

// Load vocabulary from model
llama_vocab vocab;
vocab.load(ml, kv);

// Tokenize input text
std::vector<llama_token> tokens = vocab.tokenize("Hello, world!", true, false);

// Convert token back to text
const std::string & piece = vocab.token_to_piece(tokens[0]);

// Check special tokens
llama_token bos = vocab.token_bos();
llama_token eos = vocab.token_eos();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment