Implementation:NVIDIA NeMo Curator Benchmark Runner
| Knowledge Sources | |
|---|---|
| Domains | Benchmarking, Orchestration, CI/CD |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Main entry point for the NeMo Curator benchmarking framework that orchestrates loading YAML configuration, running benchmark entries with Ray cluster management, collecting results, validating requirements, and dispatching metrics to reporting sinks.
Description
The run.py module is the core orchestrator of the benchmarking system. It coordinates the full lifecycle of a benchmark session:
Configuration Loading: The main() function parses command-line arguments, merges multiple YAML configuration files into a single dictionary, validates the configuration via Session.assert_valid_config_dict(), removes disabled blocks, and resolves environment variables. A Session object is created from the merged configuration, optionally filtered by entry name expressions.
Session Management: A timestamped session directory is created under the configured results path. Environment information is captured via dump_env(). All configured sinks are initialized at session start and finalized at session end.
Entry Execution: The run_entry() function handles individual benchmark entry execution:
- Creates subdirectories for scratch data, Ray cluster files, and logs
- Stands up a Ray cluster with the configured CPU/GPU resources and object store size via
setup_ray_cluster_and_env() - Pre-populates
params.jsonwith entry-level parameters (object store size, Ray resources, timeout) - Executes the benchmark script as a subprocess via
run_command_with_timeout() - Reads back script-generated output files via
get_entry_script_persisted_data(), which loadsparams.json,metrics.json, andtasks.pkl - Validates results against entry requirements via
check_requirements_update_results() - Tears down the Ray cluster and optionally cleans up scratch directories
Requirement Validation: The check_requirements_update_results() function supports three kinds of metric checks: exact_value (equality), min_value (lower bound), and max_value (upper bound). Failed requirements are logged and recorded in the results dictionary.
Result Reporting: Each entry's result data is passed to all configured sinks (e.g., Slack) for reporting. The overall session success is the conjunction of all entry successes, with exit code 0 indicating full success.
Usage
Use this script to run the nightly benchmark suite or a subset of benchmark entries against a NeMo Curator installation. It is typically invoked from CI/CD pipelines or manually for performance validation.
Code Reference
Source Location
- Repository: NeMo-Curator
- File: benchmarking/run.py
- Lines: 1-376
Signature
def ensure_dir(dir_path: Path) -> None: ...
def get_entry_script_persisted_data(session_entry_path: Path) -> dict[str, Any]: ...
def check_requirements_update_results(
result_data: dict[str, Any],
requirements: dict[str, Any],
) -> bool: ...
def run_entry(
entry: Entry,
path_resolver: PathResolver,
dataset_resolver: DatasetResolver,
session_path: Path,
result_data: dict[str, Any],
) -> bool: ...
def main() -> int: ...
Import
import argparse
import json
import os
import pickle
import shutil
import sys
import time
import traceback
from collections.abc import Mapping
from pathlib import Path
from typing import Any
import yaml
from loguru import logger
from nemo_curator.pipeline.workflow import WorkflowRunResult
from nemo_curator.tasks.utils import TaskPerfUtils
from nemo_curator.utils.file_utils import create_or_overwrite_dir
# Runner modules (added to PYTHONPATH at runtime)
from runner.datasets import DatasetResolver
from runner.entry import Entry
from runner.env_capture import dump_env
from runner.path_resolver import PathResolver
from runner.process import run_command_with_timeout
from runner.ray_cluster import (
get_ray_cluster_data,
setup_ray_cluster_and_env,
teardown_ray_cluster_and_env,
)
from runner.session import Session
from runner.utils import find_result, get_obj_for_json, remove_disabled_blocks, resolve_env_vars
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --config | Path (repeatable) | Yes | Path(s) to YAML configuration files defining the benchmark matrix |
| --session-name | str | No | Optional human-readable session name (default: benchmark-run__<timestamp>) |
| --entries | str | No | Expression to filter entries by name (e.g., "foo and not foobar") |
| --list | flag | No | List matching entries and exit without running |
Outputs
| Name | Type | Description |
|---|---|---|
| Exit code | int | 0 if all entries pass, 1 if any entry fails |
| Session directory | directory | Contains per-entry results.json, params.json, metrics.json, tasks.pkl, and log files |
| results.json | JSON file | Per-entry results including execution time, exit code, metrics, parameters, and requirement validation |
| Sink notifications | varies | Reports sent to configured sinks (Slack, MLflow, etc.) with execution metrics |
Usage Examples
Basic Usage
# Run from the command line
# python benchmarking/run.py --config benchmarking/nightly-benchmark.yaml
# Programmatic invocation (main returns exit code)
import sys
sys.argv = ["run.py", "--config", "benchmarking/nightly-benchmark.yaml"]
exit_code = main()
Filtering Entries
# Run only dedup-related entries
python benchmarking/run.py \
--config benchmarking/nightly-benchmark.yaml \
--entries "dedup"
# Run entries matching a complex expression
python benchmarking/run.py \
--config benchmarking/nightly-benchmark.yaml \
--entries "xenna and not video"
Merging Multiple Config Files
# Merge a base config with environment-specific overrides
python benchmarking/run.py \
--config benchmarking/nightly-benchmark.yaml \
--config benchmarking/local-overrides.yaml \
--session-name "local-test-run"