Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft LoRA Run Generation

From Leeroopedia
Revision as of 15:43, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Microsoft_LoRA_Run_Generation.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Template:Implementation meta

Overview

The run_generation.py script performs conditional text generation using auto-regressive language models (GPT-2, CTRL, OpenAI-GPT, XLNet, Transformer-XL, XLM) with configurable sampling parameters.

Description

This script provides a command-line interface for generating text with six supported auto-regressive model families. It handles model-specific input preprocessing, tokenization, generation, and output formatting.

Supported Models (via MODEL_CLASSES dict):

Model Type Key Model Class Tokenizer Class
gpt2 GPT2LMHeadModel GPT2Tokenizer
ctrl CTRLLMHeadModel CTRLTokenizer
openai-gpt OpenAIGPTLMHeadModel OpenAIGPTTokenizer
xlnet XLNetLMHeadModel XLNetTokenizer
transfo-xl TransfoXLLMHeadModel TransfoXLTokenizer
xlm XLMWithLMHeadModel XLMTokenizer

Model-specific Preprocessing (via PREPROCESSING_FUNCTIONS dict):

  • CTRL: Warns if temperature > 0.7 (CTRL works better with low temperature) and validates that the prompt starts with a control code.
  • XLM: Sets the language ID on the model config when language embeddings are available.
  • XLNet / Transformer-XL: Prepends a padding text (a historical narrative) to help the model with short prompts, following the approach by Aman Rusia.

Generation Pipeline:

  1. Parse arguments, set random seed, load model and tokenizer.
  2. Apply model-specific preprocessing to the prompt.
  3. Encode the prompt and call model.generate() with temperature, top-k, top-p, repetition penalty, and number of return sequences.
  4. Decode generated tokens, optionally truncate at a stop token, and prepend the original prompt.

MAX_LENGTH is hardcoded at 10000 to prevent infinite generation loops. The adjust_length_to_model() helper clips the requested length to the model's max_position_embeddings.

Usage

Use this script when:

  • Generating text samples from auto-regressive language models for experimentation.
  • Comparing generation quality across different model architectures (GPT-2, CTRL, XLNet, etc.).
  • Prototyping text generation pipelines with controllable sampling parameters.

Code Reference

Source Location

examples/NLU/examples/text-generation/run_generation.py (295 lines)

Signature

MAX_LENGTH = int(10000)

MODEL_CLASSES = {
    "gpt2": (GPT2LMHeadModel, GPT2Tokenizer),
    "ctrl": (CTRLLMHeadModel, CTRLTokenizer),
    "openai-gpt": (OpenAIGPTLMHeadModel, OpenAIGPTTokenizer),
    "xlnet": (XLNetLMHeadModel, XLNetTokenizer),
    "transfo-xl": (TransfoXLLMHeadModel, TransfoXLTokenizer),
    "xlm": (XLMWithLMHeadModel, XLMTokenizer),
}

PREPROCESSING_FUNCTIONS = {
    "ctrl": prepare_ctrl_input,
    "xlm": prepare_xlm_input,
    "xlnet": prepare_xlnet_input,
    "transfo-xl": prepare_transfoxl_input,
}

def set_seed(args) -> None: ...
def prepare_ctrl_input(args, _, tokenizer, prompt_text) -> str: ...
def prepare_xlm_input(args, model, tokenizer, prompt_text) -> str: ...
def prepare_xlnet_input(args, _, tokenizer, prompt_text) -> str: ...
def prepare_transfoxl_input(args, _, tokenizer, prompt_text) -> str: ...
def adjust_length_to_model(length: int, max_sequence_length: int) -> int: ...
def main() -> list: ...

Import / CLI Usage

python examples/text-generation/run_generation.py \
    --model_type gpt2 \
    --model_name_or_path gpt2 \
    --prompt "The future of AI is" \
    --length 100 \
    --temperature 0.7 \
    --k 50 \
    --p 0.95 \
    --num_return_sequences 3

I/O Contract

Inputs

Input Type Description
--model_type str (required) Model architecture: gpt2, ctrl, openai-gpt, xlnet, transfo-xl, xlm
--model_name_or_path str (required) Pretrained model name or local path
--prompt str Input prompt text; if empty, prompts interactively
--length int Number of tokens to generate. Default: 20
--stop_token str Token at which to stop generation
--temperature float Sampling temperature (1.0 = no change). Default: 1.0
--repetition_penalty float Repetition penalty (useful for CTRL at 1.2). Default: 1.0
--k int Top-k filtering. Default: 0 (disabled)
--p float Top-p (nucleus) filtering. Default: 0.9
--seed int Random seed. Default: 42
--num_return_sequences int Number of sequences to generate. Default: 1
--fp16 flag Use half-precision inference
--no_cuda flag Disable CUDA
--prefix str Text prepended to the prompt
--xlm_language str Language code for XLM model

Outputs

Output Type Description
Generated sequences stdout Printed to console with === GENERATED SEQUENCE N === headers
Return value list of str List of generated text sequences (when called programmatically)

Usage Examples

# Generate text with GPT-2
python examples/text-generation/run_generation.py \
    --model_type gpt2 \
    --model_name_or_path gpt2-medium \
    --prompt "Once upon a time" \
    --length 200 \
    --temperature 0.8 \
    --k 40 \
    --p 0.95 \
    --seed 42

# Output:
# === GENERATED SEQUENCE 1 ===
# Once upon a time, in a land far away...

# Generate with CTRL using control code
python examples/text-generation/run_generation.py \
    --model_type ctrl \
    --model_name_or_path ctrl \
    --prompt "Links A new study shows" \
    --length 100 \
    --temperature 0.5 \
    --repetition_penalty 1.2

# Generate multiple sequences with XLNet
python examples/text-generation/run_generation.py \
    --model_type xlnet \
    --model_name_or_path xlnet-base-cased \
    --prompt "The meaning of life is" \
    --length 50 \
    --num_return_sequences 5

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment