Implementation:Ggml org Llama cpp Vocab Header
| Knowledge Sources | |
|---|---|
| Domains | Tokenization, Vocabulary |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Declares the `llama_vocab` struct and the `llama_vocab_pre_type` enum, defining the public interface for vocabulary and tokenization operations.
Description
This header defines the vocabulary system used throughout llama.cpp. The `llama_vocab_pre_type` enum lists 45+ pre-tokenization strategies covering models such as LLaMA3, DeepSeek, GPT-2, Falcon, Qwen, ChatGLM, and many others. The `llama_vocab` struct provides methods for loading vocabulary from GGUF files, querying token properties (type checking, special token getters for BOS/EOS/FIM/etc.), tokenization and detokenization, BPE merge lookups, and vocabulary configuration flags (add_bos, add_eos, clean_spaces, etc.). It uses the pimpl idiom with a private `impl` struct for encapsulation.
Usage
Use this header when performing tokenization, detokenization, or querying token properties in any component that works with text-to-token or token-to-text conversion. It is the core vocabulary interface used by the model loader, context, and sampler.
Code Reference
Source Location
- Repository: Ggml_org_Llama_cpp
- File: src/llama-vocab.h
- Lines: 1-184
Signature
enum llama_vocab_pre_type {
LLAMA_VOCAB_PRE_TYPE_DEFAULT = 0,
LLAMA_VOCAB_PRE_TYPE_LLAMA3 = 1,
LLAMA_VOCAB_PRE_TYPE_DEEPSEEK_LLM = 2,
// ... 45+ pre-tokenization types
};
struct llama_vocab {
struct token_data {
std::string text;
float score;
llama_token_attr attr;
};
llama_vocab();
~llama_vocab();
void load(llama_model_loader & ml, const LLM_KV & kv);
enum llama_vocab_type get_type() const;
enum llama_vocab_pre_type get_pre_type() const;
uint32_t n_tokens() const;
int32_t tokenize(const char * text, int32_t text_len, llama_token * tokens, int32_t n_tokens_max, bool add_special, bool parse_special) const;
int32_t token_to_piece(llama_token token, char * buf, int32_t length, int32_t lstrip, bool special) const;
int32_t detokenize(const llama_token * tokens, int32_t n_tokens, char * text, int32_t text_len_max, bool remove_special, bool unparse_special) const;
llama_token token_bos() const;
llama_token token_eos() const;
// ... additional special token accessors
};
Import
#include "llama-vocab.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ml | llama_model_loader & | Yes | Model loader to read vocabulary data from GGUF file |
| kv | const LLM_KV & | Yes | Key-value mapping for vocabulary metadata keys |
| text | const char * | Yes | Input text for tokenization |
| text_len | int32_t | Yes | Length of the input text |
| tokens | llama_token * | Yes | Output buffer for tokenized result |
| n_tokens_max | int32_t | Yes | Maximum number of tokens to produce |
| add_special | bool | Yes | Whether to add special tokens (BOS/EOS) |
| parse_special | bool | Yes | Whether to parse special token markup in text |
Outputs
| Name | Type | Description |
|---|---|---|
| tokenize (return) | int32_t | Number of tokens produced, or negative if buffer too small |
| token_to_piece (return) | int32_t | Number of characters written to the output buffer |
| detokenize (return) | int32_t | Number of characters written during detokenization |
Usage Examples
// Load vocabulary from model
llama_vocab vocab;
vocab.load(ml, kv);
// Tokenize input text
std::vector<llama_token> tokens = vocab.tokenize("Hello, world!", true, false);
// Convert token back to text
const std::string & piece = vocab.token_to_piece(tokens[0]);
// Check special tokens
llama_token bos = vocab.token_bos();
llama_token eos = vocab.token_eos();