Implementation:Huggingface Open r1 Generate Completion
Overview
Concrete tool for high-concurrency asynchronous text generation via vLLM-compatible API servers provided by Open-R1.
Description
The generate_completion async function sends a single generation request to a vLLM OpenAI-compatible API endpoint. It is called by process_example which handles retries (budget of 10), prompt template formatting, and JSONL output. The main async loop in scripts/generate_reasoning.py orchestrates: loading processed UUIDs for resumability, chunked dataset iteration, semaphore-bounded concurrent requests (default 1000), and progress tracking via tqdm.asyncio.
Usage
Run as a standalone script for large-scale reasoning trace generation.
Code Reference
Source: Repository: open-r1, File: scripts/generate_reasoning.py, Lines: L21-174
Signature:
async def generate_completion(
session: aiohttp.ClientSession,
prompt: str,
args, # argparse.Namespace
) -> dict:
"""Send a single generation request to vLLM API.
Returns: {"choices": [{"message": {"content": str}, "finish_reason": str}], ...}
"""
Import:
Run as script:
python scripts/generate_reasoning.py --hf-dataset <dataset> --model <model> --api-addr localhost:39876
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| session | aiohttp.ClientSession |
Yes | Async HTTP session for making API requests |
| prompt | str |
Yes | Formatted prompt string to send to the vLLM API |
| args.model | str |
Yes | vLLM model name to use for generation |
| args.temperature | float |
No | Sampling temperature (default: 0.6) |
| args.top_p | float |
No | Nucleus sampling parameter (default: 0.95) |
| args.max_tokens | int |
No | Maximum number of tokens to generate (default: 16384) |
| args.num_generations | int |
No | Number of completions per prompt (default: 4) |
| args.api_addr | str |
Yes | Address of the vLLM server (e.g., localhost:39876)
|
| args.prompt_template | str |
No | Jinja2-style template for formatting prompts |
Outputs
| Return Type | Description |
|---|---|
dict |
Response dict with choices containing generated text, finish_reason, and api_metadata; results are also written to a JSONL output file
|
Usage Examples
# Generate reasoning traces for a math dataset using a DeepSeek-R1 model
python scripts/generate_reasoning.py \
--hf-dataset AI-MO/NuminaMath-TIR \
--model deepseek-ai/DeepSeek-R1 \
--api-addr localhost:39876 \
--temperature 0.6 \
--top_p 0.95 \
--max_tokens 16384 \
--num_generations 4 \
--max_concurrent 1000 \
--chunk_size 50000 \
--output_dir ./output