Implementation:Turboderp org Exllamav2 ExLlamaV2Tokenizer

Knowledge Sources	ExLlamaV2
Domains	NLP, Tokenization, Text_Processing
Last Updated	2026-02-15 00:00 GMT

Overview

Concrete tool for text tokenization and detokenization using HuggingFace-compatible tokenizer files, provided by exllamav2.

Description

ExLlamaV2Tokenizer wraps the HuggingFace tokenizers library to provide encoding (text to token IDs) and decoding (token IDs to text) functionality. It reads tokenizer configuration from the model directory, supporting both SentencePiece (tokenizer.model) and JSON-based (tokenizer.json) formats.

The class automatically detects and configures:

BOS/EOS/PAD token IDs from the tokenizer configuration and special_tokens_map.json
Added vocabulary tokens for chat and instruction-tuned models
Token-to-piece mappings for efficient decoding and token healing
Extended token pieces for models with large added vocabularies

The tokenizer provides both single-string and batched encoding/decoding, with options for handling special tokens, adding BOS markers, and padding to uniform length.

Usage

Use ExLlamaV2Tokenizer after initializing the config. It is required by all generators and is used to:

Encode prompts into token ID tensors for model input
Decode generated token IDs back to text
Query special token IDs for stop conditions
Measure prompt length for context window management

Code Reference

Source Location

Repository: exllamav2
File: exllamav2/tokenizer/tokenizer.py
Lines: L78-298

Signature

class ExLlamaV2Tokenizer:

    def __init__(
        self,
        config: ExLlamaV2Config,
        lazy_init: bool = True,
        force_json: bool = False,
        force_spm: bool = False,
    ):
        ...

Import

from exllamav2 import ExLlamaV2Tokenizer

I/O Contract

Inputs

Name	Type	Required	Description
config	ExLlamaV2Config	Yes	Prepared config with model_dir pointing to tokenizer files (tokenizer.json, tokenizer.model, tokenizer_config.json)
lazy_init	bool	No (default True)	Defer building internal data structures until first use
force_json	bool	No (default False)	Force use of tokenizer.json even if tokenizer.model exists
force_spm	bool	No (default False)	Force use of SentencePiece tokenizer.model

Outputs

Name	Type	Description
tokenizer instance	ExLlamaV2Tokenizer	Tokenizer with encoding/decoding methods
tokenizer.bos_token_id	int	Beginning-of-sequence token ID
tokenizer.eos_token_id	int	End-of-sequence token ID
tokenizer.pad_token_id	int	Padding token ID
tokenizer.vocab_size	int	Total vocabulary size including added tokens

Key Methods

encode

def encode(
    self,
    text: str | list[str],
    add_bos: bool = False,
    add_eos: bool = False,
    encode_special_tokens: bool = False,
    return_offsets: bool = False,
) -> torch.Tensor:
    """Encode text to token ID tensor. Shape: (1, seq_len) or (batch, seq_len)."""
    ...

decode

def decode(
    self,
    ids: torch.Tensor,
    decode_special_tokens: bool = False,
) -> str | list[str]:
    """Decode token ID tensor to text string(s)."""
    ...

Usage Examples

Basic Tokenization

from exllamav2 import ExLlamaV2Config, ExLlamaV2Tokenizer

config = ExLlamaV2Config("/path/to/model")
config.prepare()

tokenizer = ExLlamaV2Tokenizer(config)

# Encode text to token IDs
ids = tokenizer.encode("Hello, world!")
print(ids)  # tensor([[1, 15043, 29892, 3186, 29991]])

# Decode back to text
text = tokenizer.decode(ids)
print(text)  # "Hello, world!"

Encoding with Special Tokens

# Add BOS token at the beginning
ids = tokenizer.encode("Hello", add_bos=True)

# Encode special token strings like <|im_start|>
ids = tokenizer.encode("<|im_start|>user\nHello<|im_end|>", encode_special_tokens=True)

# Access special token IDs
print(f"BOS: {tokenizer.bos_token_id}")
print(f"EOS: {tokenizer.eos_token_id}")

Batch Encoding

# Encode multiple strings (padded to same length)
prompts = ["Hello", "How are you?", "Goodbye"]
ids = tokenizer.encode(prompts, add_bos=True)
print(ids.shape)  # (3, max_len)

Related Pages

Implements Principle

Principle:Turboderp_org_Exllamav2_Tokenizer_Initialization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment