Implementation:Ggml org Llama cpp Vocab Header

Knowledge Sources	Ggml_org_Llama_cpp
Domains	Tokenization, Vocabulary
Last Updated	2026-02-15 00:00 GMT

Overview

Declares the `llama_vocab` struct and the `llama_vocab_pre_type` enum, defining the public interface for vocabulary and tokenization operations.

Description

This header defines the vocabulary system used throughout llama.cpp. The `llama_vocab_pre_type` enum lists 45+ pre-tokenization strategies covering models such as LLaMA3, DeepSeek, GPT-2, Falcon, Qwen, ChatGLM, and many others. The `llama_vocab` struct provides methods for loading vocabulary from GGUF files, querying token properties (type checking, special token getters for BOS/EOS/FIM/etc.), tokenization and detokenization, BPE merge lookups, and vocabulary configuration flags (add_bos, add_eos, clean_spaces, etc.). It uses the pimpl idiom with a private `impl` struct for encapsulation.

Usage

Use this header when performing tokenization, detokenization, or querying token properties in any component that works with text-to-token or token-to-text conversion. It is the core vocabulary interface used by the model loader, context, and sampler.

Code Reference

Source Location

Repository: Ggml_org_Llama_cpp
File: src/llama-vocab.h
Lines: 1-184

Signature

enum llama_vocab_pre_type {
    LLAMA_VOCAB_PRE_TYPE_DEFAULT = 0,
    LLAMA_VOCAB_PRE_TYPE_LLAMA3 = 1,
    LLAMA_VOCAB_PRE_TYPE_DEEPSEEK_LLM = 2,
    // ... 45+ pre-tokenization types
};

struct llama_vocab {
    struct token_data {
        std::string      text;
        float            score;
        llama_token_attr attr;
    };

    llama_vocab();
    ~llama_vocab();

    void load(llama_model_loader & ml, const LLM_KV & kv);

    enum llama_vocab_type     get_type()     const;
    enum llama_vocab_pre_type get_pre_type() const;

    uint32_t n_tokens() const;

    int32_t tokenize(const char * text, int32_t text_len, llama_token * tokens, int32_t n_tokens_max, bool add_special, bool parse_special) const;
    int32_t token_to_piece(llama_token token, char * buf, int32_t length, int32_t lstrip, bool special) const;
    int32_t detokenize(const llama_token * tokens, int32_t n_tokens, char * text, int32_t text_len_max, bool remove_special, bool unparse_special) const;

    llama_token token_bos() const;
    llama_token token_eos() const;
    // ... additional special token accessors
};

Import

#include "llama-vocab.h"

I/O Contract

Inputs

Name	Type	Required	Description
ml	llama_model_loader &	Yes	Model loader to read vocabulary data from GGUF file
kv	const LLM_KV &	Yes	Key-value mapping for vocabulary metadata keys
text	const char *	Yes	Input text for tokenization
text_len	int32_t	Yes	Length of the input text
tokens	llama_token *	Yes	Output buffer for tokenized result
n_tokens_max	int32_t	Yes	Maximum number of tokens to produce
add_special	bool	Yes	Whether to add special tokens (BOS/EOS)
parse_special	bool	Yes	Whether to parse special token markup in text

Outputs

Name	Type	Description
tokenize (return)	int32_t	Number of tokens produced, or negative if buffer too small
token_to_piece (return)	int32_t	Number of characters written to the output buffer
detokenize (return)	int32_t	Number of characters written during detokenization

Usage Examples

// Load vocabulary from model
llama_vocab vocab;
vocab.load(ml, kv);

// Tokenize input text
std::vector<llama_token> tokens = vocab.tokenize("Hello, world!", true, false);

// Convert token back to text
const std::string & piece = vocab.token_to_piece(tokens[0]);

// Check special tokens
llama_token bos = vocab.token_bos();
llama_token eos = vocab.token_eos();

Related Pages

Principle:Ggml_org_Llama_cpp_Tokenization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment