Principle:Romsto Speculative Decoding Input Tokenization

Knowledge Sources	SentencePiece: A simple and language independent subword tokenizer HuggingFace Tokenizer
Domains	NLP, Preprocessing, Text_Processing
Last Updated	2026-02-14 04:30 GMT

Overview

The process of converting raw text into a sequence of integer token IDs suitable for model input, including applying chat templates for instruction-tuned models.

Description

Input Tokenization converts human-readable text into the numerical representation that transformer models consume. For instruction-tuned models (like Llama-3.2-Instruct), this involves two stages:

Chat template application: Wrapping the user's raw text in the model's expected conversation format (system/user/assistant role markers, special tokens). This is handled by the tokenizer's apply_chat_template method.
Subword tokenization: Converting the formatted text string into a sequence of integer token IDs using the model's vocabulary (typically BPE or SentencePiece).

Correct tokenization is essential because:

The chat template ensures the model understands the input as a conversation turn
The add_generation_prompt=True flag appends the assistant turn prefix so the model knows to generate a response
The resulting token IDs must match the vocabulary used during the model's pretraining

Usage

Use this principle before any generation function (speculative_generate, ngram_assisted_speculative_generate, autoregressive_generate). Apply the chat template when using instruction-tuned models. Skip the chat template (set chat=False in the CLI) when using base models or when providing pre-formatted prompts.

Theoretical Basis

The tokenization pipeline:

# Abstract tokenization pipeline
def prepare_input(raw_prompt, tokenizer, use_chat_template=True):
    if use_chat_template:
        # Wrap in conversation format with role markers
        conversation = [{"role": "user", "content": raw_prompt}]
        text = tokenizer.apply_chat_template(
            conversation,
            add_generation_prompt=True,  # append assistant prefix
            tokenize=False               # return string, not IDs
        )
    else:
        text = raw_prompt

    # Convert to token IDs
    token_ids = tokenizer(text, return_tensors="pt").input_ids[0].tolist()
    return token_ids

The reverse operation (decoding) converts token IDs back to human-readable text:

# Abstract decoding
output_text = tokenizer.decode(token_ids, skip_special_tokens=True)

Related Pages

Implemented By

Implementation:Romsto_Speculative_Decoding_Tokenizer_Apply_Chat_Template

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment