Implementation:Huggingface Open r1 Compute Pass Rate

Metadata

Field	Value
Source	Repo (https://github.com/huggingface/open-r1)
Domains	NLP, Data_Engineering
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for computing and filtering dataset problems by model pass rate using vLLM batch generation and reward function scoring provided by Open-R1.

Description

The compute_pass_rate.py script implements the full difficulty filtering pipeline:

Loads a dataset and formats prompts as chat messages via make_conversation and apply_chat_template.
Initializes a vLLM LLM engine for efficient batch generation.
Generates N completions per prompt using SamplingParams.
Scores each completion using get_reward_funcs (same functions used during GRPO training).
Computes mean pass rate per problem using torch.nanmean.
Filters problems to keep those within [pass_rate_min, pass_rate_max].
Pushes both datasets to HuggingFace Hub -- the full generated dataset and the filtered subset.

The script supports dataset sharding via dataset_start_index/dataset_end_index for Slurm parallelization across ~88 jobs.

Usage

Run as a standalone script before GRPO training to filter the training dataset by difficulty.

Code Reference

Source

Field	Value
Repository	open-r1
File	scripts/pass_rate_filtering/compute_pass_rate.py
Lines	L37-205

Signature

@dataclass
class PassRateScriptArguments(GRPOScriptArguments):
    output_dataset_name: Optional[str] = None
    pass_rate_min: float = 0.1
    pass_rate_max: float = 0.9
    dataset_start_index: Optional[int] = None
    dataset_end_index: Optional[int] = None
    dataset_split: str = "train"

Import

Run as script:

python scripts/pass_rate_filtering/compute_pass_rate.py --config recipes/dataset_filtering/config_demo.yaml

I/O Contract

Inputs

Parameter	Type	Required	Description
HF dataset	Dataset (with prompt column)	Yes	The training dataset to filter, loaded from HuggingFace Hub
Model	ModelConfig	Yes	The model used for generating completions (via vLLM)
Reward functions	GRPOScriptArguments	Yes	Same reward functions used during GRPO training, resolved via `get_reward_funcs`
pass_rate_min	float	Yes	Lower bound threshold for filtering (default: 0.1)
pass_rate_max	float	Yes	Upper bound threshold for filtering (default: 0.9)

Outputs

Output	Description
Full generations dataset	All generated completions with rewards, pushed to Hub under revision `"gen"`
Filtered subset	Problems where `pass_rate_min < mean_reward < pass_rate_max`, pushed to Hub under revision `"pass_rate"`

Usage Examples

# Example 1: Run with a YAML config file
python scripts/pass_rate_filtering/compute_pass_rate.py \
    --config recipes/dataset_filtering/config_demo.yaml

# Example 2: Run with explicit parameters
python scripts/pass_rate_filtering/compute_pass_rate.py \
    --dataset_name "HuggingFaceH4/aime-2024-prompts" \
    --model_name_or_path "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B" \
    --num_generations 16 \
    --pass_rate_min 0.1 \
    --pass_rate_max 0.9 \
    --output_dataset_name "my-org/aime-2024-filtered"

# Example 3: Run a shard for Slurm parallelization
python scripts/pass_rate_filtering/compute_pass_rate.py \
    --config recipes/dataset_filtering/config_demo.yaml \
    --dataset_start_index 0 \
    --dataset_end_index 1000

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment