Principle:FMInference FlexLLMGen Tokenizer Loading
Metadata
| Field | Value |
|---|---|
| Sources | HuggingFace Transformers|https://huggingface.co/docs/transformers, Repo|FlexLLMGen|https://github.com/FMInference/FlexLLMGen |
| Domains | NLP, Text_Processing |
| Last updated | 2026-02-09 00:00 GMT |
Overview
A text tokenization preparation step that loads a pre-trained tokenizer from HuggingFace Hub with left-padding configuration for decoder-only model inference.
Description
Before feeding text to an OPT model, prompts must be tokenized into integer token IDs. FlexLLMGen uses HuggingFace's AutoTokenizer with padding_side="left" (required for decoder-only models where new tokens are generated on the right). The BOS token is disabled (add_bos_token=False) to match OPT's expected input format. The tokenizer handles padding to uniform length, encoding text to token IDs, and decoding generated IDs back to text.
Usage
Load the tokenizer before any text processing step. Use the same model name as the inference model to ensure vocabulary consistency.
Theoretical Basis
Decoder-only models (GPT, OPT) generate tokens left-to-right. Left-padding ensures all sequences in a batch end at the same position, allowing efficient batched generation. The tokenizer vocabulary must match the model's embedding layer.