Implementation:Huggingface Open r1 Compute Pass Rate
Appearance
Metadata
| Field | Value |
|---|---|
| Source | Repo (https://github.com/huggingface/open-r1) |
| Domains | NLP, Data_Engineering |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for computing and filtering dataset problems by model pass rate using vLLM batch generation and reward function scoring provided by Open-R1.
Description
The compute_pass_rate.py script implements the full difficulty filtering pipeline:
- Loads a dataset and formats prompts as chat messages via
make_conversationandapply_chat_template. - Initializes a vLLM LLM engine for efficient batch generation.
- Generates N completions per prompt using
SamplingParams. - Scores each completion using
get_reward_funcs(same functions used during GRPO training). - Computes mean pass rate per problem using
torch.nanmean. - Filters problems to keep those within
[pass_rate_min, pass_rate_max]. - Pushes both datasets to HuggingFace Hub -- the full generated dataset and the filtered subset.
The script supports dataset sharding via dataset_start_index/dataset_end_index for Slurm parallelization across ~88 jobs.
Usage
Run as a standalone script before GRPO training to filter the training dataset by difficulty.
Code Reference
Source
| Field | Value |
|---|---|
| Repository | open-r1 |
| File | scripts/pass_rate_filtering/compute_pass_rate.py |
| Lines | L37-205 |
Signature
@dataclass
class PassRateScriptArguments(GRPOScriptArguments):
output_dataset_name: Optional[str] = None
pass_rate_min: float = 0.1
pass_rate_max: float = 0.9
dataset_start_index: Optional[int] = None
dataset_end_index: Optional[int] = None
dataset_split: str = "train"
Import
Run as script:
python scripts/pass_rate_filtering/compute_pass_rate.py --config recipes/dataset_filtering/config_demo.yaml
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| HF dataset | Dataset (with prompt column) | Yes | The training dataset to filter, loaded from HuggingFace Hub |
| Model | ModelConfig | Yes | The model used for generating completions (via vLLM) |
| Reward functions | GRPOScriptArguments | Yes | Same reward functions used during GRPO training, resolved via get_reward_funcs
|
| pass_rate_min | float | Yes | Lower bound threshold for filtering (default: 0.1) |
| pass_rate_max | float | Yes | Upper bound threshold for filtering (default: 0.9) |
Outputs
| Output | Description |
|---|---|
| Full generations dataset | All generated completions with rewards, pushed to Hub under revision "gen"
|
| Filtered subset | Problems where pass_rate_min < mean_reward < pass_rate_max, pushed to Hub under revision "pass_rate"
|
Usage Examples
# Example 1: Run with a YAML config file
python scripts/pass_rate_filtering/compute_pass_rate.py \
--config recipes/dataset_filtering/config_demo.yaml
# Example 2: Run with explicit parameters
python scripts/pass_rate_filtering/compute_pass_rate.py \
--dataset_name "HuggingFaceH4/aime-2024-prompts" \
--model_name_or_path "deepseek-ai/DeepSeek-R1-Distill-Qwen-7B" \
--num_generations 16 \
--pass_rate_min 0.1 \
--pass_rate_max 0.9 \
--output_dataset_name "my-org/aime-2024-filtered"
# Example 3: Run a shard for Slurm parallelization
python scripts/pass_rate_filtering/compute_pass_rate.py \
--config recipes/dataset_filtering/config_demo.yaml \
--dataset_start_index 0 \
--dataset_end_index 1000
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment