Implementation:Huggingface Open r1 Generate Completion

Overview

Concrete tool for high-concurrency asynchronous text generation via vLLM-compatible API servers provided by Open-R1.

Description

The generate_completion async function sends a single generation request to a vLLM OpenAI-compatible API endpoint. It is called by process_example which handles retries (budget of 10), prompt template formatting, and JSONL output. The main async loop in scripts/generate_reasoning.py orchestrates: loading processed UUIDs for resumability, chunked dataset iteration, semaphore-bounded concurrent requests (default 1000), and progress tracking via tqdm.asyncio.

Usage

Run as a standalone script for large-scale reasoning trace generation.

Code Reference

Source: Repository: open-r1, File: scripts/generate_reasoning.py, Lines: L21-174

Signature:

async def generate_completion(
    session: aiohttp.ClientSession,
    prompt: str,
    args,  # argparse.Namespace
) -> dict:
    """Send a single generation request to vLLM API.
    Returns: {"choices": [{"message": {"content": str}, "finish_reason": str}], ...}
    """

Import:

Run as script:

python scripts/generate_reasoning.py --hf-dataset <dataset> --model <model> --api-addr localhost:39876

I/O Contract

Inputs

Parameter	Type	Required	Description
session	`aiohttp.ClientSession`	Yes	Async HTTP session for making API requests
prompt	`str`	Yes	Formatted prompt string to send to the vLLM API
args.model	`str`	Yes	vLLM model name to use for generation
args.temperature	`float`	No	Sampling temperature (default: 0.6)
args.top_p	`float`	No	Nucleus sampling parameter (default: 0.95)
args.max_tokens	`int`	No	Maximum number of tokens to generate (default: 16384)
args.num_generations	`int`	No	Number of completions per prompt (default: 4)
args.api_addr	`str`	Yes	Address of the vLLM server (e.g., `localhost:39876`)
args.prompt_template	`str`	No	Jinja2-style template for formatting prompts

Outputs

Return Type	Description
`dict`	Response dict with `choices` containing generated text, `finish_reason`, and `api_metadata`; results are also written to a JSONL output file

Usage Examples

# Generate reasoning traces for a math dataset using a DeepSeek-R1 model
python scripts/generate_reasoning.py \
    --hf-dataset AI-MO/NuminaMath-TIR \
    --model deepseek-ai/DeepSeek-R1 \
    --api-addr localhost:39876 \
    --temperature 0.6 \
    --top_p 0.95 \
    --max_tokens 16384 \
    --num_generations 4 \
    --max_concurrent 1000 \
    --chunk_size 50000 \
    --output_dir ./output

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment