Environment:Volcengine Verl Megatron Core Environment
sources: Repo|verl|https://github.com/volcengine/verl
domains: Infrastructure, Distributed_Training
last_updated: 2026-02-07 17:00 GMT
Overview
Optional Megatron-Core environment for large-scale distributed training with advanced parallelism in verl.
Description
verl supports Megatron-Core as an alternative training backend to FSDP, enabling tensor parallelism, pipeline parallelism, expert parallelism, and sequence parallelism for very large models (30B+). Megatron-Core >= 0.13.0 is required. The mbridge package provides the bridge between HuggingFace configs and Megatron-Core.
Usage
Required when using Megatron-LM backend (trainer.backend=megatron). Recommended for models > 30B parameters or when using MoE architectures.
System Requirements
- Multiple NVIDIA GPUs (typically 8+ GPUs)
- High-speed interconnect (NVLink/InfiniBand)
- Linux
Dependencies
megatron-core>= 0.13.0mbridge(Megatron-Bridge)
Quick Install
pip install "verl[mcore]"
or
pip install mbridge
Code Evidence
From verl/utils/import_utils.py:28-33:
@cache
def is_megatron_core_available():
try:
mcore_spec = importlib.util.find_spec("megatron.core")
except ModuleNotFoundError:
mcore_spec = None
return mcore_spec is not None
And from verl/models/mcore/model_forward_fused.py:53:
assert version.parse(mcore.__version__) >= version.parse("0.13.0")
And from verl/models/mcore/bridge.py:16-25:
try:
from megatron.bridge import AutoBridge
except ImportError:
print("Megatron-Bridge package not found. Please install Megatron-Bridge...")
raise
Common Errors
- "Megatron-Bridge package not found" -> pip install mbridge
- "megatron.core not found" -> Install megatron-core >= 0.13.0
- Config conversion failures -> Check model architecture compatibility with Megatron
Compatibility Notes
Not all HuggingFace models have Megatron-Core equivalents. Supported architectures include Qwen2, Qwen2.5-VL, DeepSeek-V2/V3, Mixtral, and Moonlight. Router replay for MoE models is not supported on NPU.