Implementation:Turboderp org Exllamav2 ExLlamaV2Tokenizer
| Knowledge Sources | |
|---|---|
| Domains | NLP, Tokenization, Text_Processing |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Concrete tool for text tokenization and detokenization using HuggingFace-compatible tokenizer files, provided by exllamav2.
Description
ExLlamaV2Tokenizer wraps the HuggingFace tokenizers library to provide encoding (text to token IDs) and decoding (token IDs to text) functionality. It reads tokenizer configuration from the model directory, supporting both SentencePiece (tokenizer.model) and JSON-based (tokenizer.json) formats.
The class automatically detects and configures:
- BOS/EOS/PAD token IDs from the tokenizer configuration and special_tokens_map.json
- Added vocabulary tokens for chat and instruction-tuned models
- Token-to-piece mappings for efficient decoding and token healing
- Extended token pieces for models with large added vocabularies
The tokenizer provides both single-string and batched encoding/decoding, with options for handling special tokens, adding BOS markers, and padding to uniform length.
Usage
Use ExLlamaV2Tokenizer after initializing the config. It is required by all generators and is used to:
- Encode prompts into token ID tensors for model input
- Decode generated token IDs back to text
- Query special token IDs for stop conditions
- Measure prompt length for context window management
Code Reference
Source Location
- Repository: exllamav2
- File: exllamav2/tokenizer/tokenizer.py
- Lines: L78-298
Signature
class ExLlamaV2Tokenizer:
def __init__(
self,
config: ExLlamaV2Config,
lazy_init: bool = True,
force_json: bool = False,
force_spm: bool = False,
):
...
Import
from exllamav2 import ExLlamaV2Tokenizer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | ExLlamaV2Config | Yes | Prepared config with model_dir pointing to tokenizer files (tokenizer.json, tokenizer.model, tokenizer_config.json) |
| lazy_init | bool | No (default True) | Defer building internal data structures until first use |
| force_json | bool | No (default False) | Force use of tokenizer.json even if tokenizer.model exists |
| force_spm | bool | No (default False) | Force use of SentencePiece tokenizer.model |
Outputs
| Name | Type | Description |
|---|---|---|
| tokenizer instance | ExLlamaV2Tokenizer | Tokenizer with encoding/decoding methods |
| tokenizer.bos_token_id | int | Beginning-of-sequence token ID |
| tokenizer.eos_token_id | int | End-of-sequence token ID |
| tokenizer.pad_token_id | int | Padding token ID |
| tokenizer.vocab_size | int | Total vocabulary size including added tokens |
Key Methods
encode
def encode(
self,
text: str | list[str],
add_bos: bool = False,
add_eos: bool = False,
encode_special_tokens: bool = False,
return_offsets: bool = False,
) -> torch.Tensor:
"""Encode text to token ID tensor. Shape: (1, seq_len) or (batch, seq_len)."""
...
decode
def decode(
self,
ids: torch.Tensor,
decode_special_tokens: bool = False,
) -> str | list[str]:
"""Decode token ID tensor to text string(s)."""
...
Usage Examples
Basic Tokenization
from exllamav2 import ExLlamaV2Config, ExLlamaV2Tokenizer
config = ExLlamaV2Config("/path/to/model")
config.prepare()
tokenizer = ExLlamaV2Tokenizer(config)
# Encode text to token IDs
ids = tokenizer.encode("Hello, world!")
print(ids) # tensor([[1, 15043, 29892, 3186, 29991]])
# Decode back to text
text = tokenizer.decode(ids)
print(text) # "Hello, world!"
Encoding with Special Tokens
# Add BOS token at the beginning
ids = tokenizer.encode("Hello", add_bos=True)
# Encode special token strings like <|im_start|>
ids = tokenizer.encode("<|im_start|>user\nHello<|im_end|>", encode_special_tokens=True)
# Access special token IDs
print(f"BOS: {tokenizer.bos_token_id}")
print(f"EOS: {tokenizer.eos_token_id}")
Batch Encoding
# Encode multiple strings (padded to same length)
prompts = ["Hello", "How are you?", "Goodbye"]
ids = tokenizer.encode(prompts, add_bos=True)
print(ids.shape) # (3, max_len)