Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft DeepSpeedExamples Text Generation Test

From Leeroopedia


Knowledge Sources
Domains NLP, Text Generation
Last Updated 2026-02-07 12:00 GMT

Overview

Adapted HuggingFace text generation script supporting conditional auto-regressive generation with optional DeepSpeed inference integration and latency benchmarking.

Description

test-run-generation.py is a command-line script for conditional text generation using multiple auto-regressive language models from the HuggingFace Transformers library. It supports GPT-2, GPT-Neo, CTRL, OpenAI-GPT, XLNet, Transformer-XL, and XLM model families through a unified MODEL_CLASSES registry that maps model type strings to their corresponding model and tokenizer classes.

The script integrates DeepSpeed inference via the --ds-inference flag, which initializes the model with deepspeed.init_inference() using GPT-2 transformer layer injection policy and optional kernel replacement. This enables optimized inference with tensor parallelism and custom CUDA kernels. The script also supports FP16 inference via the --fp16 flag.

A key feature is the built-in latency benchmarking through the print_latency() function, which collects per-token generation latencies across all prompts (skipping the first 10 as warmup) and reports average, P50, P90, P95, P99, and P999 percentile latencies. The script supports both interactive single-prompt and batch file input modes via --sample_input.

Usage

Use this script to benchmark and test text generation with various HuggingFace language models, optionally accelerated with DeepSpeed inference. It is particularly useful for comparing baseline versus DeepSpeed-accelerated inference latency.

Code Reference

Source Location

  • Repository: Microsoft_DeepSpeedExamples
  • File: inference/huggingface/text-generation/run-generation-script/test-run-generation.py
  • Lines: 1-350

Signature

def main():
    ...

def set_seed(args):
    ...

def adjust_length_to_model(length, max_sequence_length):
    ...

def print_latency(latency_set, title=""):
    ...

def prepare_ctrl_input(args, _, tokenizer, prompt_text):
    ...

def prepare_xlm_input(args, model, tokenizer, prompt_text):
    ...

def prepare_xlnet_input(args, _, tokenizer, prompt_text):
    ...

def prepare_transfoxl_input(args, _, tokenizer, prompt_text):
    ...

Import

# This is a standalone script, run directly:
# python test-run-generation.py --model_type gpt2 --model_name_or_path gpt2

I/O Contract

Inputs

Name Type Required Description
--model_type str Yes Model architecture type: gpt2, gptneo, ctrl, openai-gpt, xlnet, transfo-xl, xlm
--model_name_or_path str Yes Path to pretrained model or HuggingFace model name
--prompt str No Text prompt for generation (interactive input if omitted)
--sample_input str No Path to file containing multiple prompts (one per line)
--length int No Maximum generation length (default: 20)
--temperature float No Sampling temperature (default: 1.0)
--k int No Top-k filtering value (default: 0)
--p float No Top-p (nucleus) filtering value (default: 0.9)
--ds-inference flag No Enable DeepSpeed inference optimization
--fp16 flag No Enable FP16 half-precision inference
--seed int No Random seed for reproducibility (default: 42)

Outputs

Name Type Description
generated_sequences List[str] List of generated text sequences printed to stdout
latency_stats stdout Percentile latency statistics (avg, P50, P90, P95, P99, P999)

Usage Examples

GPT-2 Generation with DeepSpeed

# Run GPT-2 generation with DeepSpeed inference and FP16
python test-run-generation.py \
    --model_type gpt2 \
    --model_name_or_path gpt2-large \
    --prompt "The future of artificial intelligence" \
    --length 100 \
    --fp16 \
    --ds-inference

# Batch generation from file
python test-run-generation.py \
    --model_type gpt2 \
    --model_name_or_path gpt2 \
    --sample_input prompts.txt \
    --length 50

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment