Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pytorch Serve Llama2 Tokenizer

From Leeroopedia

Overview

Tokenizer is a lightweight wrapper around SentencePiece for Llama2 text tokenization. It provides encode() and decode() methods with optional BOS/EOS token injection, and exposes vocabulary metadata properties (vocab_size, bos_id, eos_id, pad_id). This class is adapted from Meta's official Llama repository and is used in the TorchServe tensor-parallel Llama serving pipeline.

Field Value
Implementation Name Llama2_Tokenizer
Type Utility Class
Workflow LLM_Text_Generation
Domains LLM_Serving, Tokenization
Knowledge Sources Pytorch_Serve
Last Updated 2026-02-13 18:52 GMT

Description

The Tokenizer class wraps a SentencePiece model to provide encoding and decoding for Llama2 text generation. It is a minimal, focused utility (44 lines) that loads a .model file at construction time and validates that vocab_size() equals get_piece_size(). The encode method optionally prepends BOS and appends EOS tokens to the integer ID sequence.

Key Responsibilities

  • Model Loading: Initializes SentencePieceProcessor from a model file path
  • Vocabulary Metadata: Exposes n_words (vocab size), bos_id, eos_id, and pad_id as instance attributes
  • Encoding: Converts string to list of integer token IDs with optional BOS/EOS wrapping
  • Decoding: Converts list of integer token IDs back to string

Usage

from llama2_tokenizer import Tokenizer

tokenizer = Tokenizer(model_path="/path/to/tokenizer.model")

# Encode with BOS and EOS
token_ids = tokenizer.encode("Hello, world!", bos=True, eos=True)
# Result: [1, ..., 2]  (1 = BOS, 2 = EOS)

# Decode back to string
text = tokenizer.decode(token_ids)

# Access vocabulary metadata
print(tokenizer.n_words)   # e.g. 32000
print(tokenizer.bos_id)    # 1
print(tokenizer.eos_id)    # 2
print(tokenizer.pad_id)    # -1

Code Reference

Source Location

File Lines Description
examples/large_models/tp_llama/llama2_tokenizer.py L1-44 Full module (44 lines)
examples/large_models/tp_llama/llama2_tokenizer.py L14-44 Tokenizer class definition
examples/large_models/tp_llama/llama2_tokenizer.py L15-33 __init__(model_path) -- load SentencePiece model and extract metadata
examples/large_models/tp_llama/llama2_tokenizer.py L35-41 encode(s, bos, eos) -- string to token ID list
examples/large_models/tp_llama/llama2_tokenizer.py L43-44 decode(t) -- token ID list to string

Signature

class Tokenizer:

    def __init__(self, model_path: str):
        """
        Load SentencePiece model and initialize vocabulary metadata.

        Loads the .model file via SentencePieceProcessor, then
        extracts vocab_size, bos_id, eos_id, and pad_id. Asserts
        that vocab_size() == get_piece_size().

        Args:
            model_path (str): Path to the SentencePiece .model file.
        """
        ...

    def encode(self, s: str, bos: bool, eos: bool) -> List[int]:
        """
        Encode a string into a list of token IDs.

        Optionally prepends BOS token ID and appends EOS token ID.

        Args:
            s (str): Input string to encode.
            bos (bool): Whether to prepend the BOS token.
            eos (bool): Whether to append the EOS token.

        Returns:
            List[int]: List of integer token IDs.
        """
        ...

    def decode(self, t: List[int]) -> str:
        """
        Decode a list of token IDs back to a string.

        Args:
            t (List[int]): List of integer token IDs.

        Returns:
            str: Decoded text string.
        """
        ...

Import

# Module imports
from logging import getLogger
from typing import List

# Runtime import inside __init__:
from sentencepiece import SentencePieceProcessor

I/O Contract

Method Input Output Notes
__init__(model_path) str -- path to .model file None (sets self.sp_model, self.n_words, self.bos_id, self.eos_id, self.pad_id) Asserts vocab_size() == get_piece_size()
encode(s, bos, eos) str, bool, bool List[int] -- token IDs BOS prepended if bos=True; EOS appended if eos=True
decode(t) List[int] -- token IDs str -- decoded text Delegates to sp_model.decode()

Usage Examples

Example 1: Initialization and Metadata

# From llama2_tokenizer.py L15-33
class Tokenizer:
    def __init__(self, model_path: str):
        from sentencepiece import SentencePieceProcessor

        self.sp_model = SentencePieceProcessor(model_file=model_path)

        # BOS / EOS token IDs
        self.n_words: int = self.sp_model.vocab_size()
        self.bos_id: int = self.sp_model.bos_id()
        self.eos_id: int = self.sp_model.eos_id()
        self.pad_id: int = self.sp_model.pad_id()
        assert self.sp_model.vocab_size() == self.sp_model.get_piece_size()

Example 2: Encoding with BOS/EOS

# From llama2_tokenizer.py L35-41
def encode(self, s: str, bos: bool, eos: bool) -> List[int]:
    t = self.sp_model.encode(s)
    if bos:
        t = [self.bos_id] + t
    if eos:
        t = t + [self.eos_id]
    return t

# Usage:
tokenizer = Tokenizer("tokenizer.model")
ids_with_special = tokenizer.encode("Hello", bos=True, eos=True)
# [1, 15043, 2]

ids_without_special = tokenizer.encode("Hello", bos=False, eos=False)
# [15043]

Example 3: Decoding

# From llama2_tokenizer.py L43-44
def decode(self, t: List[int]) -> str:
    return self.sp_model.decode(t)

# Usage:
text = tokenizer.decode([15043])
# "Hello"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment