Implementation:Ollama Ollama Llama Vocab Types
| Knowledge Sources | |
|---|---|
| Domains | LLM Inference, Tokenization |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Header declaring the llama_vocab class, pre-tokenization type enum, and the public tokenization/detokenization API.
Description
Defines llama_vocab_pre_type enum with entries for all supported pre-tokenization patterns (LLaMA 3, DeepSeek, Falcon, GPT-2, Qwen2, ChatGLM, Tekken, and many more). Declares llama_vocab with token_data (text, score, attributes) and methods for loading from GGUF, type/metadata queries, token count, special token IDs (BOS, EOS, EOT, SEP, pad, etc.), token attribute checking, tokenization/detokenization, and chat template access. Uses pimpl pattern.
Usage
Include this header for all vocabulary and tokenization operations across the llama.cpp codebase.
Code Reference
Source Location
- Repository: Ollama
- File:
llama/llama.cpp/src/llama-vocab.h - Lines: 1-180
Signature
enum llama_vocab_pre_type {
LLAMA_VOCAB_PRE_TYPE_DEFAULT = 0,
LLAMA_VOCAB_PRE_TYPE_LLAMA3 = 1,
LLAMA_VOCAB_PRE_TYPE_DEEPSEEK_LLM = 2,
LLAMA_VOCAB_PRE_TYPE_GPT2 = 7,
LLAMA_VOCAB_PRE_TYPE_QWEN2 = 11,
// ... 40+ pre-tokenization types
};
struct llama_vocab {
struct token_data {
std::string text;
float score;
llama_token_attr attr;
};
void load(llama_model_loader & ml, const LLM_KV & kv);
uint32_t n_tokens() const;
llama_token token_bos() const;
llama_token token_eos() const;
int32_t tokenize(const char * text, int32_t text_len,
llama_token * tokens, int32_t n_tokens_max,
bool add_special, bool parse_special) const;
int32_t detokenize(const llama_token * tokens, int32_t n_tokens,
char * text, int32_t text_len_max,
bool remove_special, bool unparse_special) const;
};
Import
#include "llama-vocab.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| ml | llama_model_loader & | Yes | Model loader for loading vocab from GGUF |
| kv | const LLM_KV & | Yes | Key-value namespace resolver |
Outputs
| Name | Type | Description |
|---|---|---|
| n_tokens() | uint32_t | Total number of tokens in vocabulary |
| token_data | struct | Text, score, and attributes per token |
Usage Examples
#include "llama-vocab.h"
const auto & vocab = model.vocab;
uint32_t n_vocab = vocab.n_tokens();
llama_token bos = vocab.token_bos();
llama_token eos = vocab.token_eos();
bool is_ctl = vocab.is_control(token_id);