Implementation:Microsoft LoRA Run Generation
Overview
The run_generation.py script performs conditional text generation using auto-regressive language models (GPT-2, CTRL, OpenAI-GPT, XLNet, Transformer-XL, XLM) with configurable sampling parameters.
Description
This script provides a command-line interface for generating text with six supported auto-regressive model families. It handles model-specific input preprocessing, tokenization, generation, and output formatting.
Supported Models (via MODEL_CLASSES dict):
| Model Type Key | Model Class | Tokenizer Class |
|---|---|---|
gpt2 |
GPT2LMHeadModel |
GPT2Tokenizer
|
ctrl |
CTRLLMHeadModel |
CTRLTokenizer
|
openai-gpt |
OpenAIGPTLMHeadModel |
OpenAIGPTTokenizer
|
xlnet |
XLNetLMHeadModel |
XLNetTokenizer
|
transfo-xl |
TransfoXLLMHeadModel |
TransfoXLTokenizer
|
xlm |
XLMWithLMHeadModel |
XLMTokenizer
|
Model-specific Preprocessing (via PREPROCESSING_FUNCTIONS dict):
- CTRL: Warns if temperature > 0.7 (CTRL works better with low temperature) and validates that the prompt starts with a control code.
- XLM: Sets the language ID on the model config when language embeddings are available.
- XLNet / Transformer-XL: Prepends a padding text (a historical narrative) to help the model with short prompts, following the approach by Aman Rusia.
Generation Pipeline:
- Parse arguments, set random seed, load model and tokenizer.
- Apply model-specific preprocessing to the prompt.
- Encode the prompt and call
model.generate()with temperature, top-k, top-p, repetition penalty, and number of return sequences. - Decode generated tokens, optionally truncate at a stop token, and prepend the original prompt.
MAX_LENGTH is hardcoded at 10000 to prevent infinite generation loops. The adjust_length_to_model() helper clips the requested length to the model's max_position_embeddings.
Usage
Use this script when:
- Generating text samples from auto-regressive language models for experimentation.
- Comparing generation quality across different model architectures (GPT-2, CTRL, XLNet, etc.).
- Prototyping text generation pipelines with controllable sampling parameters.
Code Reference
Source Location
examples/NLU/examples/text-generation/run_generation.py (295 lines)
Signature
MAX_LENGTH = int(10000)
MODEL_CLASSES = {
"gpt2": (GPT2LMHeadModel, GPT2Tokenizer),
"ctrl": (CTRLLMHeadModel, CTRLTokenizer),
"openai-gpt": (OpenAIGPTLMHeadModel, OpenAIGPTTokenizer),
"xlnet": (XLNetLMHeadModel, XLNetTokenizer),
"transfo-xl": (TransfoXLLMHeadModel, TransfoXLTokenizer),
"xlm": (XLMWithLMHeadModel, XLMTokenizer),
}
PREPROCESSING_FUNCTIONS = {
"ctrl": prepare_ctrl_input,
"xlm": prepare_xlm_input,
"xlnet": prepare_xlnet_input,
"transfo-xl": prepare_transfoxl_input,
}
def set_seed(args) -> None: ...
def prepare_ctrl_input(args, _, tokenizer, prompt_text) -> str: ...
def prepare_xlm_input(args, model, tokenizer, prompt_text) -> str: ...
def prepare_xlnet_input(args, _, tokenizer, prompt_text) -> str: ...
def prepare_transfoxl_input(args, _, tokenizer, prompt_text) -> str: ...
def adjust_length_to_model(length: int, max_sequence_length: int) -> int: ...
def main() -> list: ...
Import / CLI Usage
python examples/text-generation/run_generation.py \
--model_type gpt2 \
--model_name_or_path gpt2 \
--prompt "The future of AI is" \
--length 100 \
--temperature 0.7 \
--k 50 \
--p 0.95 \
--num_return_sequences 3
I/O Contract
Inputs
| Input | Type | Description |
|---|---|---|
--model_type |
str (required) | Model architecture: gpt2, ctrl, openai-gpt, xlnet, transfo-xl, xlm
|
--model_name_or_path |
str (required) | Pretrained model name or local path |
--prompt |
str | Input prompt text; if empty, prompts interactively |
--length |
int | Number of tokens to generate. Default: 20 |
--stop_token |
str | Token at which to stop generation |
--temperature |
float | Sampling temperature (1.0 = no change). Default: 1.0 |
--repetition_penalty |
float | Repetition penalty (useful for CTRL at 1.2). Default: 1.0 |
--k |
int | Top-k filtering. Default: 0 (disabled) |
--p |
float | Top-p (nucleus) filtering. Default: 0.9 |
--seed |
int | Random seed. Default: 42 |
--num_return_sequences |
int | Number of sequences to generate. Default: 1 |
--fp16 |
flag | Use half-precision inference |
--no_cuda |
flag | Disable CUDA |
--prefix |
str | Text prepended to the prompt |
--xlm_language |
str | Language code for XLM model |
Outputs
| Output | Type | Description |
|---|---|---|
| Generated sequences | stdout | Printed to console with === GENERATED SEQUENCE N === headers
|
| Return value | list of str | List of generated text sequences (when called programmatically) |
Usage Examples
# Generate text with GPT-2
python examples/text-generation/run_generation.py \
--model_type gpt2 \
--model_name_or_path gpt2-medium \
--prompt "Once upon a time" \
--length 200 \
--temperature 0.8 \
--k 40 \
--p 0.95 \
--seed 42
# Output:
# === GENERATED SEQUENCE 1 ===
# Once upon a time, in a land far away...
# Generate with CTRL using control code
python examples/text-generation/run_generation.py \
--model_type ctrl \
--model_name_or_path ctrl \
--prompt "Links A new study shows" \
--length 100 \
--temperature 0.5 \
--repetition_penalty 1.2
# Generate multiple sequences with XLNet
python examples/text-generation/run_generation.py \
--model_type xlnet \
--model_name_or_path xlnet-base-cased \
--prompt "The meaning of life is" \
--length 50 \
--num_return_sequences 5