Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen AutoTokenizer Usage

From Leeroopedia
Revision as of 14:55, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/FMInference_FlexLLMGen_AutoTokenizer_Usage.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Metadata

Field Value
Sources FlexLLMGen|https://github.com/FMInference/FlexLLMGen, Doc|HuggingFace Transformers|https://huggingface.co/docs/transformers
Domains NLP, Text_Processing
Last updated 2026-02-09 00:00 GMT

Overview

Wrapper documentation for HuggingFace AutoTokenizer as configured and used by FlexLLMGen for OPT model inference.

Description

This is a Wrapper Doc for HuggingFace's AutoTokenizer. FlexLLMGen configures it with padding_side="left" and add_bos_token=False for OPT decoder-only models. The tokenizer is used for: (1) encoding prompts to input_ids with padding="max_length", (2) obtaining stop token IDs (e.g., newline), and (3) decoding output_ids back to text with skip_special_tokens=True.

External Reference

Code Reference

  • Source: flexllmgen/apps/completion.py, Lines: 45-60 (usage pattern)
  • FlexLLMGen-specific configuration:
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-30b", padding_side="left")
tokenizer.add_bos_token = False
  • Import:
from transformers import AutoTokenizer

I/O Contract

from_pretrained() Inputs

Name Type Required Description
name str Yes HuggingFace model name
padding_side str No "left" for decoder models

__call__() Inputs

Name Type Required Description
prompts List[str] Yes Text prompts
padding str No "max_length"
max_length int No Sequence length

batch_decode() Inputs

Name Type Required Description
output_ids np.ndarray Yes Token IDs from generation
skip_special_tokens bool No Strip special tokens (default True)

Outputs

  • from_pretrained returns AutoTokenizer
  • __call__ returns BatchEncoding with input_ids
  • batch_decode returns List[str]

Usage Examples

from transformers import AutoTokenizer

# Load with FlexLLMGen configuration
tokenizer = AutoTokenizer.from_pretrained("facebook/opt-30b", padding_side="left")
tokenizer.add_bos_token = False

# Tokenize prompts
prompts = ["Question: What is AI?\nAnswer:"]
inputs = tokenizer(prompts, padding="max_length", max_length=128)
# inputs.input_ids: List[List[int]] padded from left

# Get stop token (newline)
stop = tokenizer("\n").input_ids[0]

# Decode outputs
output_ids = model.generate(inputs.input_ids, max_new_tokens=32, stop=stop)
outputs = tokenizer.batch_decode(output_ids, skip_special_tokens=True)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment