Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Turboderp org Exllamav2 ExLlamaV2Tokenizer

From Leeroopedia
Knowledge Sources
Domains NLP, Tokenization, Text_Processing
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete tool for text tokenization and detokenization using HuggingFace-compatible tokenizer files, provided by exllamav2.

Description

ExLlamaV2Tokenizer wraps the HuggingFace tokenizers library to provide encoding (text to token IDs) and decoding (token IDs to text) functionality. It reads tokenizer configuration from the model directory, supporting both SentencePiece (tokenizer.model) and JSON-based (tokenizer.json) formats.

The class automatically detects and configures:

  • BOS/EOS/PAD token IDs from the tokenizer configuration and special_tokens_map.json
  • Added vocabulary tokens for chat and instruction-tuned models
  • Token-to-piece mappings for efficient decoding and token healing
  • Extended token pieces for models with large added vocabularies

The tokenizer provides both single-string and batched encoding/decoding, with options for handling special tokens, adding BOS markers, and padding to uniform length.

Usage

Use ExLlamaV2Tokenizer after initializing the config. It is required by all generators and is used to:

  • Encode prompts into token ID tensors for model input
  • Decode generated token IDs back to text
  • Query special token IDs for stop conditions
  • Measure prompt length for context window management

Code Reference

Source Location

  • Repository: exllamav2
  • File: exllamav2/tokenizer/tokenizer.py
  • Lines: L78-298

Signature

class ExLlamaV2Tokenizer:

    def __init__(
        self,
        config: ExLlamaV2Config,
        lazy_init: bool = True,
        force_json: bool = False,
        force_spm: bool = False,
    ):
        ...

Import

from exllamav2 import ExLlamaV2Tokenizer

I/O Contract

Inputs

Name Type Required Description
config ExLlamaV2Config Yes Prepared config with model_dir pointing to tokenizer files (tokenizer.json, tokenizer.model, tokenizer_config.json)
lazy_init bool No (default True) Defer building internal data structures until first use
force_json bool No (default False) Force use of tokenizer.json even if tokenizer.model exists
force_spm bool No (default False) Force use of SentencePiece tokenizer.model

Outputs

Name Type Description
tokenizer instance ExLlamaV2Tokenizer Tokenizer with encoding/decoding methods
tokenizer.bos_token_id int Beginning-of-sequence token ID
tokenizer.eos_token_id int End-of-sequence token ID
tokenizer.pad_token_id int Padding token ID
tokenizer.vocab_size int Total vocabulary size including added tokens

Key Methods

encode

def encode(
    self,
    text: str | list[str],
    add_bos: bool = False,
    add_eos: bool = False,
    encode_special_tokens: bool = False,
    return_offsets: bool = False,
) -> torch.Tensor:
    """Encode text to token ID tensor. Shape: (1, seq_len) or (batch, seq_len)."""
    ...

decode

def decode(
    self,
    ids: torch.Tensor,
    decode_special_tokens: bool = False,
) -> str | list[str]:
    """Decode token ID tensor to text string(s)."""
    ...

Usage Examples

Basic Tokenization

from exllamav2 import ExLlamaV2Config, ExLlamaV2Tokenizer

config = ExLlamaV2Config("/path/to/model")
config.prepare()

tokenizer = ExLlamaV2Tokenizer(config)

# Encode text to token IDs
ids = tokenizer.encode("Hello, world!")
print(ids)  # tensor([[1, 15043, 29892, 3186, 29991]])

# Decode back to text
text = tokenizer.decode(ids)
print(text)  # "Hello, world!"

Encoding with Special Tokens

# Add BOS token at the beginning
ids = tokenizer.encode("Hello", add_bos=True)

# Encode special token strings like <|im_start|>
ids = tokenizer.encode("<|im_start|>user\nHello<|im_end|>", encode_special_tokens=True)

# Access special token IDs
print(f"BOS: {tokenizer.bos_token_id}")
print(f"EOS: {tokenizer.eos_token_id}")

Batch Encoding

# Encode multiple strings (padded to same length)
prompts = ["Hello", "How are you?", "Goodbye"]
ids = tokenizer.encode(prompts, add_bos=True)
print(ids.shape)  # (3, max_len)

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment