Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Open r1 Generate Completion

From Leeroopedia


Template:Metadata

Overview

Concrete tool for high-concurrency asynchronous text generation via vLLM-compatible API servers provided by Open-R1.

Description

The generate_completion async function sends a single generation request to a vLLM OpenAI-compatible API endpoint. It is called by process_example which handles retries (budget of 10), prompt template formatting, and JSONL output. The main async loop in scripts/generate_reasoning.py orchestrates: loading processed UUIDs for resumability, chunked dataset iteration, semaphore-bounded concurrent requests (default 1000), and progress tracking via tqdm.asyncio.

Usage

Run as a standalone script for large-scale reasoning trace generation.

Code Reference

Source: Repository: open-r1, File: scripts/generate_reasoning.py, Lines: L21-174

Signature:

async def generate_completion(
    session: aiohttp.ClientSession,
    prompt: str,
    args,  # argparse.Namespace
) -> dict:
    """Send a single generation request to vLLM API.
    Returns: {"choices": [{"message": {"content": str}, "finish_reason": str}], ...}
    """

Import:

Run as script:

python scripts/generate_reasoning.py --hf-dataset <dataset> --model <model> --api-addr localhost:39876

I/O Contract

Inputs

Parameter Type Required Description
session aiohttp.ClientSession Yes Async HTTP session for making API requests
prompt str Yes Formatted prompt string to send to the vLLM API
args.model str Yes vLLM model name to use for generation
args.temperature float No Sampling temperature (default: 0.6)
args.top_p float No Nucleus sampling parameter (default: 0.95)
args.max_tokens int No Maximum number of tokens to generate (default: 16384)
args.num_generations int No Number of completions per prompt (default: 4)
args.api_addr str Yes Address of the vLLM server (e.g., localhost:39876)
args.prompt_template str No Jinja2-style template for formatting prompts

Outputs

Return Type Description
dict Response dict with choices containing generated text, finish_reason, and api_metadata; results are also written to a JSONL output file

Usage Examples

# Generate reasoning traces for a math dataset using a DeepSeek-R1 model
python scripts/generate_reasoning.py \
    --hf-dataset AI-MO/NuminaMath-TIR \
    --model deepseek-ai/DeepSeek-R1 \
    --api-addr localhost:39876 \
    --temperature 0.6 \
    --top_p 0.95 \
    --max_tokens 16384 \
    --num_generations 4 \
    --max_concurrent 1000 \
    --chunk_size 50000 \
    --output_dir ./output

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment