| Knowledge Sources |
Domains |
Last Updated
|
| ggml-org/llama.cpp |
Text Tokenization, BPE Encoding, Token ID Conversion |
2026-02-14
|
Overview
Description
llama_tokenize converts a text string into a sequence of integer token IDs using the vocabulary and tokenization rules embedded in the model. The function supports BPE, SentencePiece, and WordPiece tokenization schemes depending on the model's vocabulary type. It handles special token insertion (BOS/EOS) and can optionally parse special token text representations in the input string.
The function follows a two-pass pattern: calling with a zero-size output buffer returns the negated count of required tokens, allowing the caller to allocate the exact buffer size needed before performing the actual tokenization.
Usage
#include "llama.h"
const llama_vocab * vocab = llama_model_get_vocab(model);
const char * prompt = "Hello my name is";
// Pass 1: determine token count
const int n_prompt = -llama_tokenize(vocab, prompt, strlen(prompt), NULL, 0, true, true);
// Pass 2: perform tokenization
std::vector<llama_token> tokens(n_prompt);
if (llama_tokenize(vocab, prompt, strlen(prompt), tokens.data(), tokens.size(), true, true) < 0) {
fprintf(stderr, "Tokenization failed\n");
return 1;
}
Code Reference
Source Location
| File |
Line(s) |
Type
|
include/llama.h |
1102-1109 |
Declaration
|
src/llama-vocab.cpp |
3908-3917 |
Implementation
|
Signature
LLAMA_API int32_t llama_tokenize(
const struct llama_vocab * vocab,
const char * text,
int32_t text_len,
llama_token * tokens,
int32_t n_tokens_max,
bool add_special,
bool parse_special);
Import
I/O Contract
Inputs
| Parameter |
Type |
Description
|
vocab |
const struct llama_vocab * |
Vocabulary handle obtained from llama_model_get_vocab(model). Contains the tokenizer rules and vocabulary table.
|
text |
const char * |
Input text string to tokenize. Does not need to be null-terminated; length is specified by text_len.
|
text_len |
int32_t |
Length of the input text in bytes.
|
tokens |
llama_token * |
Output buffer for token IDs. Can be NULL when querying the required token count (set n_tokens_max to 0).
|
n_tokens_max |
int32_t |
Maximum number of tokens that the output buffer can hold. Set to 0 with tokens = NULL to query the required count.
|
add_special |
bool |
If true, automatically add BOS and EOS tokens if the model is configured to include them.
|
parse_special |
bool |
endoftext|>") as their corresponding token IDs. If false, such text is tokenized as regular plaintext. Does not insert a leading space.
|
Outputs
| Return |
Type |
Description
|
| token count |
int32_t |
On success: the number of tokens written to the output buffer (no more than n_tokens_max).
|
| negative count |
int32_t |
On insufficient buffer: a negative number whose absolute value is the number of tokens that would have been produced. Use this to allocate the correct buffer size.
|
| INT32_MIN |
int32_t |
On overflow: the tokenization result size exceeds the int32_t limit.
|
Usage Examples
Two-Pass Tokenization (from examples/simple/simple.cpp)
const llama_vocab * vocab = llama_model_get_vocab(model);
std::string prompt = "Hello my name is";
// First pass: find the number of tokens in the prompt
const int n_prompt = -llama_tokenize(vocab, prompt.c_str(), prompt.size(), NULL, 0, true, true);
// Second pass: allocate space and tokenize
std::vector<llama_token> prompt_tokens(n_prompt);
if (llama_tokenize(vocab, prompt.c_str(), prompt.size(),
prompt_tokens.data(), prompt_tokens.size(),
true, true) < 0) {
fprintf(stderr, "error: failed to tokenize the prompt\n");
return 1;
}
Tokenization Without Special Tokens
const char * text = "Hello world";
// Do not add BOS/EOS, do not parse special tokens
int n = -llama_tokenize(vocab, text, strlen(text), NULL, 0, false, false);
llama_token tokens[128];
llama_tokenize(vocab, text, strlen(text), tokens, 128, false, false);
Converting Tokens Back to Text
After generation, use llama_token_to_piece to convert individual tokens back to text:
for (auto id : prompt_tokens) {
char buf[128];
int n = llama_token_to_piece(vocab, id, buf, sizeof(buf), 0, true);
if (n < 0) {
fprintf(stderr, "error: failed to convert token to piece\n");
return 1;
}
std::string s(buf, n);
printf("%s", s.c_str());
}
Related Pages