Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA NeMo Curator Benchmark Runner

From Leeroopedia
Knowledge Sources
Domains Benchmarking, Orchestration, CI/CD
Last Updated 2026-02-14 00:00 GMT

Overview

Main entry point for the NeMo Curator benchmarking framework that orchestrates loading YAML configuration, running benchmark entries with Ray cluster management, collecting results, validating requirements, and dispatching metrics to reporting sinks.

Description

The run.py module is the core orchestrator of the benchmarking system. It coordinates the full lifecycle of a benchmark session:

Configuration Loading: The main() function parses command-line arguments, merges multiple YAML configuration files into a single dictionary, validates the configuration via Session.assert_valid_config_dict(), removes disabled blocks, and resolves environment variables. A Session object is created from the merged configuration, optionally filtered by entry name expressions.

Session Management: A timestamped session directory is created under the configured results path. Environment information is captured via dump_env(). All configured sinks are initialized at session start and finalized at session end.

Entry Execution: The run_entry() function handles individual benchmark entry execution:

  1. Creates subdirectories for scratch data, Ray cluster files, and logs
  2. Stands up a Ray cluster with the configured CPU/GPU resources and object store size via setup_ray_cluster_and_env()
  3. Pre-populates params.json with entry-level parameters (object store size, Ray resources, timeout)
  4. Executes the benchmark script as a subprocess via run_command_with_timeout()
  5. Reads back script-generated output files via get_entry_script_persisted_data(), which loads params.json, metrics.json, and tasks.pkl
  6. Validates results against entry requirements via check_requirements_update_results()
  7. Tears down the Ray cluster and optionally cleans up scratch directories

Requirement Validation: The check_requirements_update_results() function supports three kinds of metric checks: exact_value (equality), min_value (lower bound), and max_value (upper bound). Failed requirements are logged and recorded in the results dictionary.

Result Reporting: Each entry's result data is passed to all configured sinks (e.g., Slack) for reporting. The overall session success is the conjunction of all entry successes, with exit code 0 indicating full success.

Usage

Use this script to run the nightly benchmark suite or a subset of benchmark entries against a NeMo Curator installation. It is typically invoked from CI/CD pipelines or manually for performance validation.

Code Reference

Source Location

  • Repository: NeMo-Curator
  • File: benchmarking/run.py
  • Lines: 1-376

Signature

def ensure_dir(dir_path: Path) -> None: ...

def get_entry_script_persisted_data(session_entry_path: Path) -> dict[str, Any]: ...

def check_requirements_update_results(
    result_data: dict[str, Any],
    requirements: dict[str, Any],
) -> bool: ...

def run_entry(
    entry: Entry,
    path_resolver: PathResolver,
    dataset_resolver: DatasetResolver,
    session_path: Path,
    result_data: dict[str, Any],
) -> bool: ...

def main() -> int: ...

Import

import argparse
import json
import os
import pickle
import shutil
import sys
import time
import traceback
from collections.abc import Mapping
from pathlib import Path
from typing import Any

import yaml
from loguru import logger

from nemo_curator.pipeline.workflow import WorkflowRunResult
from nemo_curator.tasks.utils import TaskPerfUtils
from nemo_curator.utils.file_utils import create_or_overwrite_dir

# Runner modules (added to PYTHONPATH at runtime)
from runner.datasets import DatasetResolver
from runner.entry import Entry
from runner.env_capture import dump_env
from runner.path_resolver import PathResolver
from runner.process import run_command_with_timeout
from runner.ray_cluster import (
    get_ray_cluster_data,
    setup_ray_cluster_and_env,
    teardown_ray_cluster_and_env,
)
from runner.session import Session
from runner.utils import find_result, get_obj_for_json, remove_disabled_blocks, resolve_env_vars

I/O Contract

Inputs

Name Type Required Description
--config Path (repeatable) Yes Path(s) to YAML configuration files defining the benchmark matrix
--session-name str No Optional human-readable session name (default: benchmark-run__<timestamp>)
--entries str No Expression to filter entries by name (e.g., "foo and not foobar")
--list flag No List matching entries and exit without running

Outputs

Name Type Description
Exit code int 0 if all entries pass, 1 if any entry fails
Session directory directory Contains per-entry results.json, params.json, metrics.json, tasks.pkl, and log files
results.json JSON file Per-entry results including execution time, exit code, metrics, parameters, and requirement validation
Sink notifications varies Reports sent to configured sinks (Slack, MLflow, etc.) with execution metrics

Usage Examples

Basic Usage

# Run from the command line
# python benchmarking/run.py --config benchmarking/nightly-benchmark.yaml

# Programmatic invocation (main returns exit code)
import sys
sys.argv = ["run.py", "--config", "benchmarking/nightly-benchmark.yaml"]
exit_code = main()

Filtering Entries

# Run only dedup-related entries
python benchmarking/run.py \
    --config benchmarking/nightly-benchmark.yaml \
    --entries "dedup"

# Run entries matching a complex expression
python benchmarking/run.py \
    --config benchmarking/nightly-benchmark.yaml \
    --entries "xenna and not video"

Merging Multiple Config Files

# Merge a base config with environment-specific overrides
python benchmarking/run.py \
    --config benchmarking/nightly-benchmark.yaml \
    --config benchmarking/local-overrides.yaml \
    --session-name "local-test-run"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment