Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Volcengine Verl Megatron Core Environment

From Leeroopedia


sources: Repo|verl|https://github.com/volcengine/verl

domains: Infrastructure, Distributed_Training

last_updated: 2026-02-07 17:00 GMT

Overview

Optional Megatron-Core environment for large-scale distributed training with advanced parallelism in verl.

Description

verl supports Megatron-Core as an alternative training backend to FSDP, enabling tensor parallelism, pipeline parallelism, expert parallelism, and sequence parallelism for very large models (30B+). Megatron-Core >= 0.13.0 is required. The mbridge package provides the bridge between HuggingFace configs and Megatron-Core.

Usage

Required when using Megatron-LM backend (trainer.backend=megatron). Recommended for models > 30B parameters or when using MoE architectures.

System Requirements

  • Multiple NVIDIA GPUs (typically 8+ GPUs)
  • High-speed interconnect (NVLink/InfiniBand)
  • Linux

Dependencies

  • megatron-core >= 0.13.0
  • mbridge (Megatron-Bridge)

Quick Install

pip install "verl[mcore]"

or

pip install mbridge

Code Evidence

From verl/utils/import_utils.py:28-33:

@cache
def is_megatron_core_available():
    try:
        mcore_spec = importlib.util.find_spec("megatron.core")
    except ModuleNotFoundError:
        mcore_spec = None
    return mcore_spec is not None

And from verl/models/mcore/model_forward_fused.py:53:

assert version.parse(mcore.__version__) >= version.parse("0.13.0")

And from verl/models/mcore/bridge.py:16-25:

try:
    from megatron.bridge import AutoBridge
except ImportError:
    print("Megatron-Bridge package not found. Please install Megatron-Bridge...")
    raise

Common Errors

  • "Megatron-Bridge package not found" -> pip install mbridge
  • "megatron.core not found" -> Install megatron-core >= 0.13.0
  • Config conversion failures -> Check model architecture compatibility with Megatron

Compatibility Notes

Not all HuggingFace models have Megatron-Core equivalents. Supported architectures include Qwen2, Qwen2.5-VL, DeepSeek-V2/V3, Mixtral, and Moonlight. Router replay for MoE models is not supported on NPU.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment