Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:FMInference FlexLLMGen Tokenizer Loading

From Leeroopedia
Revision as of 18:05, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/FMInference_FlexLLMGen_Tokenizer_Loading.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Metadata

Field Value
Sources HuggingFace Transformers|https://huggingface.co/docs/transformers, Repo|FlexLLMGen|https://github.com/FMInference/FlexLLMGen
Domains NLP, Text_Processing
Last updated 2026-02-09 00:00 GMT

Overview

A text tokenization preparation step that loads a pre-trained tokenizer from HuggingFace Hub with left-padding configuration for decoder-only model inference.

Description

Before feeding text to an OPT model, prompts must be tokenized into integer token IDs. FlexLLMGen uses HuggingFace's AutoTokenizer with padding_side="left" (required for decoder-only models where new tokens are generated on the right). The BOS token is disabled (add_bos_token=False) to match OPT's expected input format. The tokenizer handles padding to uniform length, encoding text to token IDs, and decoding generated IDs back to text.

Usage

Load the tokenizer before any text processing step. Use the same model name as the inference model to ensure vocabulary consistency.

Theoretical Basis

Decoder-only models (GPT, OPT) generate tokens left-to-right. Left-padding ensures all sequences in a batch end at the same position, allowing efficient batched generation. The tokenizer vocabulary must match the model's embedding layer.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment