Implementation:Ollama Ollama Llama Vocab Types

Knowledge Sources	Ollama
Domains	LLM Inference, Tokenization
Last Updated	2025-02-15 00:00 GMT

Overview

Header declaring the llama_vocab class, pre-tokenization type enum, and the public tokenization/detokenization API.

Description

Defines llama_vocab_pre_type enum with entries for all supported pre-tokenization patterns (LLaMA 3, DeepSeek, Falcon, GPT-2, Qwen2, ChatGLM, Tekken, and many more). Declares llama_vocab with token_data (text, score, attributes) and methods for loading from GGUF, type/metadata queries, token count, special token IDs (BOS, EOS, EOT, SEP, pad, etc.), token attribute checking, tokenization/detokenization, and chat template access. Uses pimpl pattern.

Usage

Include this header for all vocabulary and tokenization operations across the llama.cpp codebase.

Code Reference

Source Location

Repository: Ollama
File: llama/llama.cpp/src/llama-vocab.h
Lines: 1-180

Signature

enum llama_vocab_pre_type {
    LLAMA_VOCAB_PRE_TYPE_DEFAULT = 0,
    LLAMA_VOCAB_PRE_TYPE_LLAMA3  = 1,
    LLAMA_VOCAB_PRE_TYPE_DEEPSEEK_LLM = 2,
    LLAMA_VOCAB_PRE_TYPE_GPT2    = 7,
    LLAMA_VOCAB_PRE_TYPE_QWEN2   = 11,
    // ... 40+ pre-tokenization types
};

struct llama_vocab {
    struct token_data {
        std::string      text;
        float            score;
        llama_token_attr attr;
    };

    void load(llama_model_loader & ml, const LLM_KV & kv);
    uint32_t n_tokens() const;
    llama_token token_bos() const;
    llama_token token_eos() const;
    int32_t tokenize(const char * text, int32_t text_len,
        llama_token * tokens, int32_t n_tokens_max,
        bool add_special, bool parse_special) const;
    int32_t detokenize(const llama_token * tokens, int32_t n_tokens,
        char * text, int32_t text_len_max,
        bool remove_special, bool unparse_special) const;
};

Import

#include "llama-vocab.h"

I/O Contract

Inputs

Name	Type	Required	Description
ml	llama_model_loader &	Yes	Model loader for loading vocab from GGUF
kv	const LLM_KV &	Yes	Key-value namespace resolver

Outputs

Name	Type	Description
n_tokens()	uint32_t	Total number of tokens in vocabulary
token_data	struct	Text, score, and attributes per token

Usage Examples

#include "llama-vocab.h"

const auto & vocab = model.vocab;
uint32_t n_vocab = vocab.n_tokens();
llama_token bos = vocab.token_bos();
llama_token eos = vocab.token_eos();
bool is_ctl = vocab.is_control(token_id);

Related Pages

Principle:Ollama_Ollama_Tokenization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment