Implementation:Turboderp org Exllamav2 ExLlamaV2TokenizerHF

Knowledge Sources	Turboderp_org_Exllamav2
Domains	Tokenization
Last Updated	2026-02-15 00:00 GMT

Overview

ExLlamaV2TokenizerHF is a concrete tokenizer implementation that wraps the HuggingFace Tokenizers library to provide ExLlamaV2-compatible tokenization, with automatic detection of BPE space and newline characters.

Description

This class extends ExLlamaV2TokenizerBase by wrapping a HuggingFace Tokenizer instance loaded from a tokenizer.json file. It implements all abstract methods defined in the base class.

During initialization, the class loads the tokenizer from the JSON file and inspects the underlying model type. If the model is a BPE tokenizer (e.g., GPT-style), it auto-detects the internal representations of space and newline characters using deduce_char_map() from the base class. For BPE models, spaces are typically represented as a special character like "G" (U+0120) and newlines as "C" (U+010A). Non-BPE models use standard space and newline characters.

The enumerate_tokens() method handles a subtlety of HuggingFace tokenizers: some tokenizers cannot decode individual token IDs in isolation (they produce different results than when decoded as part of a sequence). The method detects this by encoding a test string (" t") and checking whether the decoded single-token result matches the expected output. If not, it uses a prefix-based decoding strategy where each token is decoded as a pair with a space token prefix, then the prefix portion is stripped. The decoded vocabulary is cached in self.vocab to avoid repeated computation.

Special token accessors (unk_id, pad_id, bos_id, eos_id) return None for most tokens since HuggingFace tokenizers handle special tokens at a higher level. The unk_id() method does resolve through unk_token() if available on the underlying model.

Usage

Use ExLlamaV2TokenizerHF when loading models that ship with a HuggingFace-format tokenizer.json file. This is the default tokenizer backend for most modern LLMs. It is automatically selected by the ExLlamaV2Tokenizer wrapper class during model initialization.

Code Reference

Source Location

Repository: Turboderp_org_Exllamav2
File: exllamav2/tokenizer/hf.py
Lines: 1-86

Signature

class ExLlamaV2TokenizerHF(ExLlamaV2TokenizerBase):

    space_char_: str
    newline_char_: str
    vocab: list[str] | None

    def __init__(self, tokenizer_json: str) -> None: ...

    # Special token accessors
    def unk_id(self) -> int or None: ...
    def pad_id(self) -> int or None: ...
    def bos_id(self) -> int or None: ...
    def eos_id(self) -> int or None: ...
    def unk_token(self) -> str or None: ...
    def pad_token(self) -> str or None: ...
    def bos_token(self) -> str or None: ...
    def eos_token(self) -> str or None: ...

    # Character mapping
    def space_char(self) -> str: ...
    def newline_char(self) -> str: ...

    # Core tokenization
    def enumerate_tokens(self): ...
    def vocab_size(self) -> int: ...
    def id_to_piece(self, idx: int) -> str: ...
    def piece_to_id(self, text: str) -> int: ...
    def decode(self, ids: List[int]) -> str: ...
    def encode(self, text: list or str) -> list: ...

Import

from exllamav2.tokenizer.hf import ExLlamaV2TokenizerHF

I/O Contract

init()

Parameter	Type	Description
tokenizer_json	`str`	File path to the HuggingFace tokenizer.json file

encode()

Parameter	Type	Description
text	list	Text string (or list) to encode

Return	Type	Description
ids	`list[int]`	List of token IDs (special tokens not added; uses `add_special_tokens=False`)

decode()

Parameter	Type	Description
ids	`List[int]`	List of token IDs to decode

Return	Type	Description
text	`str`	Decoded text string

enumerate_tokens()

Return	Type	Description
iterator	`enumerate`	Yields `(index, decoded_piece)` tuples for the entire vocabulary; result is cached after first call

id_to_piece() / piece_to_id()

Method	Parameter	Return	Description
id_to_piece	`idx: int`	`str`	Returns the raw token string for a given ID (None-safe, returns "" for None)
piece_to_id	`text: str`	`int`	Returns the token ID for a given piece string

Usage Examples

from exllamav2.tokenizer.hf import ExLlamaV2TokenizerHF

# Load from a HuggingFace tokenizer.json file
tokenizer = ExLlamaV2TokenizerHF("/path/to/model/tokenizer.json")

# Encode text to token IDs
ids = tokenizer.encode("Hello, world!")
print(ids)  # e.g., [15496, 11, 995, 0]

# Decode token IDs back to text
text = tokenizer.decode(ids)
print(text)  # "Hello, world!"

# Get vocabulary size
print(tokenizer.vocab_size())  # e.g., 32000

# Convert between pieces and IDs
piece = tokenizer.id_to_piece(15496)
print(piece)  # e.g., "Hello"

token_id = tokenizer.piece_to_id("Hello")
print(token_id)  # e.g., 15496

# Enumerate the full vocabulary (cached after first call)
for idx, piece in tokenizer.enumerate_tokens():
    if idx < 5:
        print(f"Token {idx}: {repr(piece)}")

# Check internal character representations
print(repr(tokenizer.space_char()))    # e.g., 'G' for BPE models
print(repr(tokenizer.newline_char()))  # e.g., 'C' for BPE models

Related Pages

Turboderp_org_Exllamav2_ExLlamaV2TokenizerBase - Abstract base class that defines the tokenizer interface
Turboderp_org_Exllamav2_WebSocket_Actions - Server actions that use the tokenizer for encoding/decoding inference requests

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment