Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Llama Vocab Types

From Leeroopedia
Knowledge Sources
Domains LLM Inference, Tokenization
Last Updated 2025-02-15 00:00 GMT

Overview

Header declaring the llama_vocab class, pre-tokenization type enum, and the public tokenization/detokenization API.

Description

Defines llama_vocab_pre_type enum with entries for all supported pre-tokenization patterns (LLaMA 3, DeepSeek, Falcon, GPT-2, Qwen2, ChatGLM, Tekken, and many more). Declares llama_vocab with token_data (text, score, attributes) and methods for loading from GGUF, type/metadata queries, token count, special token IDs (BOS, EOS, EOT, SEP, pad, etc.), token attribute checking, tokenization/detokenization, and chat template access. Uses pimpl pattern.

Usage

Include this header for all vocabulary and tokenization operations across the llama.cpp codebase.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/src/llama-vocab.h
  • Lines: 1-180

Signature

enum llama_vocab_pre_type {
    LLAMA_VOCAB_PRE_TYPE_DEFAULT = 0,
    LLAMA_VOCAB_PRE_TYPE_LLAMA3  = 1,
    LLAMA_VOCAB_PRE_TYPE_DEEPSEEK_LLM = 2,
    LLAMA_VOCAB_PRE_TYPE_GPT2    = 7,
    LLAMA_VOCAB_PRE_TYPE_QWEN2   = 11,
    // ... 40+ pre-tokenization types
};

struct llama_vocab {
    struct token_data {
        std::string      text;
        float            score;
        llama_token_attr attr;
    };

    void load(llama_model_loader & ml, const LLM_KV & kv);
    uint32_t n_tokens() const;
    llama_token token_bos() const;
    llama_token token_eos() const;
    int32_t tokenize(const char * text, int32_t text_len,
        llama_token * tokens, int32_t n_tokens_max,
        bool add_special, bool parse_special) const;
    int32_t detokenize(const llama_token * tokens, int32_t n_tokens,
        char * text, int32_t text_len_max,
        bool remove_special, bool unparse_special) const;
};

Import

#include "llama-vocab.h"

I/O Contract

Inputs

Name Type Required Description
ml llama_model_loader & Yes Model loader for loading vocab from GGUF
kv const LLM_KV & Yes Key-value namespace resolver

Outputs

Name Type Description
n_tokens() uint32_t Total number of tokens in vocabulary
token_data struct Text, score, and attributes per token

Usage Examples

#include "llama-vocab.h"

const auto & vocab = model.vocab;
uint32_t n_vocab = vocab.n_tokens();
llama_token bos = vocab.token_bos();
llama_token eos = vocab.token_eos();
bool is_ctl = vocab.is_control(token_id);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment