Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA NeMo Curator Start Prometheus Grafana

From Leeroopedia
Knowledge Sources
Domains Monitoring, Metrics, Infrastructure
Last Updated 2026-02-14 00:00 GMT

Overview

Entry point script that downloads, configures, and launches Prometheus and Grafana monitoring services for NeMo Curator workloads.

Description

The start_prometheus_grafana module provides a one-command setup for the complete monitoring stack used by NeMo Curator. The main function start_prometheus_grafana() orchestrates the full lifecycle: it checks whether Prometheus or Grafana are already running (skipping setup if so), ensures the metrics directory exists, finds available ports starting from defaults, downloads and extracts Prometheus, starts it as a background process, then downloads Grafana, writes its configuration files (INI, datasource YAML, dashboard provisioning YAML), and launches the Grafana server. A brief sleep ensures Grafana has time to start before logging access instructions.

The module also serves as a CLI script with an argument parser that accepts --prometheus_web_port, --grafana_web_port, and --yes (to skip the confirmation prompt).

Usage

Use this module when you need to quickly stand up Prometheus and Grafana monitoring for Ray-based curation workloads. It is especially useful when running on Xenna clusters where metrics export is needed. Can be invoked directly from the command line or called programmatically from other NeMo Curator components.

Code Reference

Source Location

  • Repository: NeMo-Curator
  • File: nemo_curator/metrics/start_prometheus_grafana.py
  • Lines: 1-118

Signature

def start_prometheus_grafana(
    prometheus_web_port: int = DEFAULT_PROMETHEUS_WEB_PORT,
    grafana_web_port: int = DEFAULT_GRAFANA_WEB_PORT,
) -> None:
    ...

Import

from nemo_curator.metrics.start_prometheus_grafana import start_prometheus_grafana

I/O Contract

Inputs

Name Type Required Description
prometheus_web_port int No Port number to run Prometheus on (defaults to DEFAULT_PROMETHEUS_WEB_PORT). A free port is found starting from this value.
grafana_web_port int No Port number to run Grafana on (defaults to DEFAULT_GRAFANA_WEB_PORT). A free port is found starting from this value.

Outputs

Name Type Description
(return) None The function starts both services as background processes and returns None.

Internal Workflow

The function follows this sequence:

  1. Check running services -- Uses is_prometheus_running() and is_grafana_running() from nemo_curator.metrics.utils to detect already-running instances. If either is found, the function logs a message and returns immediately.
  2. Ensure metrics directory -- Creates DEFAULT_NEMO_CURATOR_METRICS_PATH if it does not exist.
  3. Port allocation -- Calls get_free_port() for both Prometheus and Grafana starting from the requested port numbers.
  4. Download and start Prometheus -- Calls download_and_extract_prometheus() to fetch the Prometheus binary, then run_prometheus() to start it as a background subprocess.
  5. Download, configure, and start Grafana -- Calls download_grafana() to fetch the Grafana binary, write_grafana_configs() to generate INI and provisioning files, and launch_grafana() to start the server.
  6. Log access info -- Prints URLs for Grafana (with default admin/admin credentials) and Prometheus, plus instructions for killing the services.

CLI Usage

# Launch with default ports and confirmation prompt
python -m nemo_curator.metrics.start_prometheus_grafana

# Launch with custom ports and skip confirmation
python -m nemo_curator.metrics.start_prometheus_grafana \
    --prometheus_web_port 9090 \
    --grafana_web_port 3000 \
    --yes

Usage Examples

Basic Usage

from nemo_curator.metrics.start_prometheus_grafana import start_prometheus_grafana

# Start with default ports
start_prometheus_grafana()

# Start with custom ports
start_prometheus_grafana(prometheus_web_port=9091, grafana_web_port=3001)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment