Implementation:FMInference FlexLLMGen Prompt Construction Pattern
| Field | Value |
|---|---|
| Sources | FlexLLMGen |
| Domains | Prompt_Engineering, NLP |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Pattern documentation for constructing few-shot text completion prompts as used by FlexLLMGen's completion application.
Description
This is a Pattern Doc -- it documents a user-defined pattern rather than a library API. FlexLLMGen's completion.py demonstrates prompt construction as a list of Python strings, where each string is a multi-line prompt with few-shot examples followed by the query. Prompts use newline-delimited Question/Answer pairs or Text/Extraction pairs. The prompt list length must equal gpu_batch_size (since num_gpu_batches=1 in the completion example).
Code Reference
- Source: flexllmgen/apps/completion.py, Lines: 11-22
- Import: No import needed -- this is a user-defined pattern
Pattern interface:
# User-defined prompt construction pattern
# Each prompt is a multi-line string with few-shot examples
prompts = [
# Q&A format
"Question: Where were the 2004 Olympics held?\n"
"Answer: Athens, Greece\n"
"Question: What is the longest river on the earth?\n"
"Answer:",
# Extraction format
"Extract the airport codes from this text.\n"
"Text: \"I want a flight from New York to San Francisco.\"\n"
"Airport codes: JFK, SFO.\n"
"Text: \"I want you to book a flight from Phoenix to Las Vegas.\"\n"
"Airport codes:",
]
# Constraint: len(prompts) == policy.gpu_batch_size * policy.num_gpu_batches
I/O Contract
| Direction | Name | Description |
|---|---|---|
| Input | Task description | Conceptual description of what the model should do |
| Input | Few-shot examples | Labeled demonstrations establishing the input-output format |
| Input | Query | The actual input to process |
| Output | prompts | List[str], list of prompt strings ready for tokenization, length must match batch size |
Usage Examples
# Example 1: Question-Answering prompt
prompts = [
"Question: What is the capital of France?\n"
"Answer: Paris\n"
"Question: What is the capital of Japan?\n"
"Answer:",
]
# Example 2: Classification prompt
prompts = [
"Classify the sentiment: 'Great product!' -> Positive\n"
"Classify the sentiment: 'Terrible experience.' -> Negative\n"
"Classify the sentiment: 'It works fine.' ->",
]
# Pass to tokenizer and model
inputs = tokenizer(prompts, padding="max_length", max_length=128)
output_ids = model.generate(inputs.input_ids, max_new_tokens=32, stop=stop)