Principle:FMInference FlexLLMGen Tokenizer Loading

Metadata

Field	Value
Sources	HuggingFace Transformers\|https://huggingface.co/docs/transformers, Repo\|FlexLLMGen\|https://github.com/FMInference/FlexLLMGen
Domains	NLP, Text_Processing
Last updated	2026-02-09 00:00 GMT

Overview

A text tokenization preparation step that loads a pre-trained tokenizer from HuggingFace Hub with left-padding configuration for decoder-only model inference.

Description

Before feeding text to an OPT model, prompts must be tokenized into integer token IDs. FlexLLMGen uses HuggingFace's AutoTokenizer with padding_side="left" (required for decoder-only models where new tokens are generated on the right). The BOS token is disabled (add_bos_token=False) to match OPT's expected input format. The tokenizer handles padding to uniform length, encoding text to token IDs, and decoding generated IDs back to text.

Usage

Load the tokenizer before any text processing step. Use the same model name as the inference model to ensure vocabulary consistency.

Theoretical Basis

Decoder-only models (GPT, OPT) generate tokens left-to-right. Left-padding ensures all sequences in a batch end at the same position, allowing efficient batched generation. The tokenizer vocabulary must match the model's embedding layer.

Related Pages

Implementation:FMInference_FlexLLMGen_AutoTokenizer_Usage

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment