Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Turboderp org Exllamav2 ExLlamaV2TokenizerHF

From Leeroopedia
Knowledge Sources
Domains Tokenization
Last Updated 2026-02-15 00:00 GMT

Overview

ExLlamaV2TokenizerHF is a concrete tokenizer implementation that wraps the HuggingFace Tokenizers library to provide ExLlamaV2-compatible tokenization, with automatic detection of BPE space and newline characters.

Description

This class extends ExLlamaV2TokenizerBase by wrapping a HuggingFace Tokenizer instance loaded from a tokenizer.json file. It implements all abstract methods defined in the base class.

During initialization, the class loads the tokenizer from the JSON file and inspects the underlying model type. If the model is a BPE tokenizer (e.g., GPT-style), it auto-detects the internal representations of space and newline characters using deduce_char_map() from the base class. For BPE models, spaces are typically represented as a special character like "G" (U+0120) and newlines as "C" (U+010A). Non-BPE models use standard space and newline characters.

The enumerate_tokens() method handles a subtlety of HuggingFace tokenizers: some tokenizers cannot decode individual token IDs in isolation (they produce different results than when decoded as part of a sequence). The method detects this by encoding a test string (" t") and checking whether the decoded single-token result matches the expected output. If not, it uses a prefix-based decoding strategy where each token is decoded as a pair with a space token prefix, then the prefix portion is stripped. The decoded vocabulary is cached in self.vocab to avoid repeated computation.

Special token accessors (unk_id, pad_id, bos_id, eos_id) return None for most tokens since HuggingFace tokenizers handle special tokens at a higher level. The unk_id() method does resolve through unk_token() if available on the underlying model.

Usage

Use ExLlamaV2TokenizerHF when loading models that ship with a HuggingFace-format tokenizer.json file. This is the default tokenizer backend for most modern LLMs. It is automatically selected by the ExLlamaV2Tokenizer wrapper class during model initialization.

Code Reference

Source Location

Signature

class ExLlamaV2TokenizerHF(ExLlamaV2TokenizerBase):

    space_char_: str
    newline_char_: str
    vocab: list[str] | None

    def __init__(self, tokenizer_json: str) -> None: ...

    # Special token accessors
    def unk_id(self) -> int or None: ...
    def pad_id(self) -> int or None: ...
    def bos_id(self) -> int or None: ...
    def eos_id(self) -> int or None: ...
    def unk_token(self) -> str or None: ...
    def pad_token(self) -> str or None: ...
    def bos_token(self) -> str or None: ...
    def eos_token(self) -> str or None: ...

    # Character mapping
    def space_char(self) -> str: ...
    def newline_char(self) -> str: ...

    # Core tokenization
    def enumerate_tokens(self): ...
    def vocab_size(self) -> int: ...
    def id_to_piece(self, idx: int) -> str: ...
    def piece_to_id(self, text: str) -> int: ...
    def decode(self, ids: List[int]) -> str: ...
    def encode(self, text: list or str) -> list: ...

Import

from exllamav2.tokenizer.hf import ExLlamaV2TokenizerHF

I/O Contract

__init__()

Parameter Type Description
tokenizer_json str File path to the HuggingFace tokenizer.json file

encode()

Parameter Type Description
text list Text string (or list) to encode
Return Type Description
ids list[int] List of token IDs (special tokens not added; uses add_special_tokens=False)

decode()

Parameter Type Description
ids List[int] List of token IDs to decode
Return Type Description
text str Decoded text string

enumerate_tokens()

Return Type Description
iterator enumerate Yields (index, decoded_piece) tuples for the entire vocabulary; result is cached after first call

id_to_piece() / piece_to_id()

Method Parameter Return Description
id_to_piece idx: int str Returns the raw token string for a given ID (None-safe, returns "" for None)
piece_to_id text: str int Returns the token ID for a given piece string

Usage Examples

from exllamav2.tokenizer.hf import ExLlamaV2TokenizerHF

# Load from a HuggingFace tokenizer.json file
tokenizer = ExLlamaV2TokenizerHF("/path/to/model/tokenizer.json")

# Encode text to token IDs
ids = tokenizer.encode("Hello, world!")
print(ids)  # e.g., [15496, 11, 995, 0]

# Decode token IDs back to text
text = tokenizer.decode(ids)
print(text)  # "Hello, world!"

# Get vocabulary size
print(tokenizer.vocab_size())  # e.g., 32000

# Convert between pieces and IDs
piece = tokenizer.id_to_piece(15496)
print(piece)  # e.g., "Hello"

token_id = tokenizer.piece_to_id("Hello")
print(token_id)  # e.g., 15496

# Enumerate the full vocabulary (cached after first call)
for idx, piece in tokenizer.enumerate_tokens():
    if idx < 5:
        print(f"Token {idx}: {repr(piece)}")

# Check internal character representations
print(repr(tokenizer.space_char()))    # e.g., 'G' for BPE models
print(repr(tokenizer.newline_char()))  # e.g., 'C' for BPE models

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment