Implementation:Pytorch Serve Llama2 Tokenizer

Overview

Tokenizer is a lightweight wrapper around SentencePiece for Llama2 text tokenization. It provides encode() and decode() methods with optional BOS/EOS token injection, and exposes vocabulary metadata properties (vocab_size, bos_id, eos_id, pad_id). This class is adapted from Meta's official Llama repository and is used in the TorchServe tensor-parallel Llama serving pipeline.

Field	Value
Implementation Name	Llama2_Tokenizer
Type	Utility Class
Workflow	LLM_Text_Generation
Domains	LLM_Serving, Tokenization
Knowledge Sources	Pytorch_Serve
Last Updated	2026-02-13 18:52 GMT

Description

The Tokenizer class wraps a SentencePiece model to provide encoding and decoding for Llama2 text generation. It is a minimal, focused utility (44 lines) that loads a .model file at construction time and validates that vocab_size() equals get_piece_size(). The encode method optionally prepends BOS and appends EOS tokens to the integer ID sequence.

Key Responsibilities

Model Loading: Initializes SentencePieceProcessor from a model file path
Vocabulary Metadata: Exposes n_words (vocab size), bos_id, eos_id, and pad_id as instance attributes
Encoding: Converts string to list of integer token IDs with optional BOS/EOS wrapping
Decoding: Converts list of integer token IDs back to string

Usage

from llama2_tokenizer import Tokenizer

tokenizer = Tokenizer(model_path="/path/to/tokenizer.model")

# Encode with BOS and EOS
token_ids = tokenizer.encode("Hello, world!", bos=True, eos=True)
# Result: [1, ..., 2]  (1 = BOS, 2 = EOS)

# Decode back to string
text = tokenizer.decode(token_ids)

# Access vocabulary metadata
print(tokenizer.n_words)   # e.g. 32000
print(tokenizer.bos_id)    # 1
print(tokenizer.eos_id)    # 2
print(tokenizer.pad_id)    # -1

Code Reference

Source Location

File	Lines	Description
`examples/large_models/tp_llama/llama2_tokenizer.py`	L1-44	Full module (44 lines)
`examples/large_models/tp_llama/llama2_tokenizer.py`	L14-44	`Tokenizer` class definition
`examples/large_models/tp_llama/llama2_tokenizer.py`	L15-33	`__init__(model_path)` -- load SentencePiece model and extract metadata
`examples/large_models/tp_llama/llama2_tokenizer.py`	L35-41	`encode(s, bos, eos)` -- string to token ID list
`examples/large_models/tp_llama/llama2_tokenizer.py`	L43-44	`decode(t)` -- token ID list to string

Signature

class Tokenizer:

    def __init__(self, model_path: str):
        """
        Load SentencePiece model and initialize vocabulary metadata.

        Loads the .model file via SentencePieceProcessor, then
        extracts vocab_size, bos_id, eos_id, and pad_id. Asserts
        that vocab_size() == get_piece_size().

        Args:
            model_path (str): Path to the SentencePiece .model file.
        """
        ...

    def encode(self, s: str, bos: bool, eos: bool) -> List[int]:
        """
        Encode a string into a list of token IDs.

        Optionally prepends BOS token ID and appends EOS token ID.

        Args:
            s (str): Input string to encode.
            bos (bool): Whether to prepend the BOS token.
            eos (bool): Whether to append the EOS token.

        Returns:
            List[int]: List of integer token IDs.
        """
        ...

    def decode(self, t: List[int]) -> str:
        """
        Decode a list of token IDs back to a string.

        Args:
            t (List[int]): List of integer token IDs.

        Returns:
            str: Decoded text string.
        """
        ...

Import

# Module imports
from logging import getLogger
from typing import List

# Runtime import inside __init__:
from sentencepiece import SentencePieceProcessor

I/O Contract

Method	Input	Output	Notes
`__init__(model_path)`	`str` -- path to `.model` file	None (sets `self.sp_model`, `self.n_words`, `self.bos_id`, `self.eos_id`, `self.pad_id`)	Asserts `vocab_size() == get_piece_size()`
`encode(s, bos, eos)`	`str`, `bool`, `bool`	`List[int]` -- token IDs	BOS prepended if `bos=True`; EOS appended if `eos=True`
`decode(t)`	`List[int]` -- token IDs	`str` -- decoded text	Delegates to `sp_model.decode()`

Usage Examples

Example 1: Initialization and Metadata

# From llama2_tokenizer.py L15-33
class Tokenizer:
    def __init__(self, model_path: str):
        from sentencepiece import SentencePieceProcessor

        self.sp_model = SentencePieceProcessor(model_file=model_path)

        # BOS / EOS token IDs
        self.n_words: int = self.sp_model.vocab_size()
        self.bos_id: int = self.sp_model.bos_id()
        self.eos_id: int = self.sp_model.eos_id()
        self.pad_id: int = self.sp_model.pad_id()
        assert self.sp_model.vocab_size() == self.sp_model.get_piece_size()

Example 2: Encoding with BOS/EOS

# From llama2_tokenizer.py L35-41
def encode(self, s: str, bos: bool, eos: bool) -> List[int]:
    t = self.sp_model.encode(s)
    if bos:
        t = [self.bos_id] + t
    if eos:
        t = t + [self.eos_id]
    return t

# Usage:
tokenizer = Tokenizer("tokenizer.model")
ids_with_special = tokenizer.encode("Hello", bos=True, eos=True)
# [1, 15043, 2]

ids_without_special = tokenizer.encode("Hello", bos=False, eos=False)
# [15043]

Example 3: Decoding

# From llama2_tokenizer.py L43-44
def decode(self, t: List[int]) -> str:
    return self.sp_model.decode(t)

# Usage:
text = tokenizer.decode([15043])
# "Hello"

Related Pages

Principle:Pytorch_Serve_LLM_Text_Generation -- principle for LLM text generation serving pipelines
Implementation:Pytorch_Serve_Llama2_Checkpoint_Converter - Companion checkpoint converter for the TP Llama pipeline

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment