Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Llama cpp Llama Tokenize

From Leeroopedia
Knowledge Sources Domains Last Updated
ggml-org/llama.cpp Text Tokenization, BPE Encoding, Token ID Conversion 2026-02-14

Overview

Description

llama_tokenize converts a text string into a sequence of integer token IDs using the vocabulary and tokenization rules embedded in the model. The function supports BPE, SentencePiece, and WordPiece tokenization schemes depending on the model's vocabulary type. It handles special token insertion (BOS/EOS) and can optionally parse special token text representations in the input string.

The function follows a two-pass pattern: calling with a zero-size output buffer returns the negated count of required tokens, allowing the caller to allocate the exact buffer size needed before performing the actual tokenization.

Usage

#include "llama.h"

const llama_vocab * vocab = llama_model_get_vocab(model);
const char * prompt = "Hello my name is";

// Pass 1: determine token count
const int n_prompt = -llama_tokenize(vocab, prompt, strlen(prompt), NULL, 0, true, true);

// Pass 2: perform tokenization
std::vector<llama_token> tokens(n_prompt);
if (llama_tokenize(vocab, prompt, strlen(prompt), tokens.data(), tokens.size(), true, true) < 0) {
    fprintf(stderr, "Tokenization failed\n");
    return 1;
}

Code Reference

Source Location

File Line(s) Type
include/llama.h 1102-1109 Declaration
src/llama-vocab.cpp 3908-3917 Implementation

Signature

LLAMA_API int32_t llama_tokenize(
    const struct llama_vocab * vocab,
                  const char * text,
                     int32_t   text_len,
                 llama_token * tokens,
                     int32_t   n_tokens_max,
                        bool   add_special,
                        bool   parse_special);

Import

#include "llama.h"

I/O Contract

Inputs

Parameter Type Description
vocab const struct llama_vocab * Vocabulary handle obtained from llama_model_get_vocab(model). Contains the tokenizer rules and vocabulary table.
text const char * Input text string to tokenize. Does not need to be null-terminated; length is specified by text_len.
text_len int32_t Length of the input text in bytes.
tokens llama_token * Output buffer for token IDs. Can be NULL when querying the required token count (set n_tokens_max to 0).
n_tokens_max int32_t Maximum number of tokens that the output buffer can hold. Set to 0 with tokens = NULL to query the required count.
add_special bool If true, automatically add BOS and EOS tokens if the model is configured to include them.
parse_special bool endoftext|>") as their corresponding token IDs. If false, such text is tokenized as regular plaintext. Does not insert a leading space.

Outputs

Return Type Description
token count int32_t On success: the number of tokens written to the output buffer (no more than n_tokens_max).
negative count int32_t On insufficient buffer: a negative number whose absolute value is the number of tokens that would have been produced. Use this to allocate the correct buffer size.
INT32_MIN int32_t On overflow: the tokenization result size exceeds the int32_t limit.

Usage Examples

Two-Pass Tokenization (from examples/simple/simple.cpp)

const llama_vocab * vocab = llama_model_get_vocab(model);

std::string prompt = "Hello my name is";

// First pass: find the number of tokens in the prompt
const int n_prompt = -llama_tokenize(vocab, prompt.c_str(), prompt.size(), NULL, 0, true, true);

// Second pass: allocate space and tokenize
std::vector<llama_token> prompt_tokens(n_prompt);
if (llama_tokenize(vocab, prompt.c_str(), prompt.size(),
                   prompt_tokens.data(), prompt_tokens.size(),
                   true, true) < 0) {
    fprintf(stderr, "error: failed to tokenize the prompt\n");
    return 1;
}

Tokenization Without Special Tokens

const char * text = "Hello world";

// Do not add BOS/EOS, do not parse special tokens
int n = -llama_tokenize(vocab, text, strlen(text), NULL, 0, false, false);
llama_token tokens[128];
llama_tokenize(vocab, text, strlen(text), tokens, 128, false, false);

Converting Tokens Back to Text

After generation, use llama_token_to_piece to convert individual tokens back to text:

for (auto id : prompt_tokens) {
    char buf[128];
    int n = llama_token_to_piece(vocab, id, buf, sizeof(buf), 0, true);
    if (n < 0) {
        fprintf(stderr, "error: failed to convert token to piece\n");
        return 1;
    }
    std::string s(buf, n);
    printf("%s", s.c_str());
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment