Implementation:FMInference FlexLLMGen AutoTokenizer Usage
Appearance
Metadata
| Field | Value |
|---|---|
| Sources | FlexLLMGen|https://github.com/FMInference/FlexLLMGen, Doc|HuggingFace Transformers|https://huggingface.co/docs/transformers |
| Domains | NLP, Text_Processing |
| Last updated | 2026-02-09 00:00 GMT |
Overview
Wrapper documentation for HuggingFace AutoTokenizer as configured and used by FlexLLMGen for OPT model inference.
Description
This is a Wrapper Doc for HuggingFace's AutoTokenizer. FlexLLMGen configures it with padding_side="left" and add_bos_token=False for OPT decoder-only models. The tokenizer is used for: (1) encoding prompts to input_ids with padding="max_length", (2) obtaining stop token IDs (e.g., newline), and (3) decoding output_ids back to text with skip_special_tokens=True.
External Reference
Code Reference
- Source: flexllmgen/apps/completion.py, Lines: 45-60 (usage pattern)
- FlexLLMGen-specific configuration:
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-30b", padding_side="left")
tokenizer.add_bos_token = False
- Import:
from transformers import AutoTokenizer
I/O Contract
from_pretrained() Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| name | str | Yes | HuggingFace model name |
| padding_side | str | No | "left" for decoder models |
__call__() Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| prompts | List[str] | Yes | Text prompts |
| padding | str | No | "max_length" |
| max_length | int | No | Sequence length |
batch_decode() Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| output_ids | np.ndarray | Yes | Token IDs from generation |
| skip_special_tokens | bool | No | Strip special tokens (default True) |
Outputs
- from_pretrained returns AutoTokenizer
- __call__ returns BatchEncoding with input_ids
- batch_decode returns List[str]
Usage Examples
from transformers import AutoTokenizer
# Load with FlexLLMGen configuration
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-30b", padding_side="left")
tokenizer.add_bos_token = False
# Tokenize prompts
prompts = ["Question: What is AI?\nAnswer:"]
inputs = tokenizer(prompts, padding="max_length", max_length=128)
# inputs.input_ids: List[List[int]] padded from left
# Get stop token (newline)
stop = tokenizer("\n").input_ids[0]
# Decode outputs
output_ids = model.generate(inputs.input_ids, max_new_tokens=32, stop=stop)
outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment