Implementation:Alibaba ROLL Run Train
| Knowledge Sources | |
|---|---|
| Domains | Training, Distributed_Computing |
| Last Updated | 2026-02-07 20:00 GMT |
Overview
Main training entry point for MCoreAdapter that orchestrates PreTraining, SFT, and DPO workflows with either Megatron-Core or LLaMA-Factory backends.
Description
run_train.py serves as the unified command-line entry point for launching distributed training jobs using the MCoreAdapter framework. It parses command-line arguments into structured dataclasses (training, model, data, finetuning, and generating arguments), downloads models from HuggingFace Hub or ModelScope as needed, and dispatches to the appropriate training pipeline based on the configured stage (pt, sft, or dpo).
The module supports two backends:
- Megatron-Core (MCA): Uses AutoModel.from_pretrained to load models into Megatron-Core format, applies LoRA adapters via apply_megatron_lora(), and trains using McaTrainer or DPOTrainer.
- LLaMA-Factory: Falls back to the standard LLaMA-Factory training runners (run_pt, run_sft, run_dpo) when use_mca=False.
The module also provides a data_collator_wrapper that shifts labels and input_ids by one position to implement next-token prediction, and a setup_lora_training function that configures PEFT LoRA adapters with Megatron-compatible target modules.
Usage
Use this module as the main script to launch distributed training on LLMs. It is invoked directly via:
python mcore_adapter/examples/train/run_train.py \
--model_name_or_path Qwen/Qwen2.5-7B \
--stage sft \
--use_mca True \
--tensor_model_parallel_size 2
Set --use_mca False to fall back to the LLaMA-Factory training path.
Code Reference
Source Location
- Repository: Alibaba_ROLL
- File: mcore_adapter/examples/train/run_train.py
- Lines: 1-324
Key Functions
download_model
def download_model(model_name_or_path: str, local_dir: str = None) -> str
Downloads a model from HuggingFace Hub (or ModelScope when USE_MODELSCOPE=1) with file locking for concurrency safety. Returns the local directory path. If the path is already a local directory, returns it directly.
get_args
def get_args() -> Tuple[
Seq2SeqTrainingArguments,
ModelArguments,
DataArguments,
FinetuningArguments,
GeneratingArguments,
UseMcaArguments,
]
Parses command-line arguments using HfArgumentParser into six dataclass instances. When use_mca is False, re-parses using standard HuggingFace Seq2SeqTrainingArguments instead of MCA-augmented ones.
data_collator_wrapper
def data_collator_wrapper(data_collator) -> wrapper
Wraps a data collator to shift labels left by one position and input_ids right by one position, implementing the standard next-token prediction objective.
setup_lora_training
def setup_lora_training(model, finetuning_args) -> model
Enables input gradient computation, discovers all linear modules as LoRA targets, creates a LoraConfig, wraps the model with get_peft_model, and casts trainable parameters to float32.
pt_mca_train
def pt_mca_train(
training_args: Seq2SeqTrainingArguments,
model_args: ModelArguments,
data_args: DataArguments,
finetuning_args: FinetuningArguments,
) -> None
Runs pre-training using Megatron-Core. Loads tokenizer/template, creates model via AutoModel.from_pretrained, optionally applies LoRA, prepares dataset and data collator, then trains with McaTrainer.
sft_mca_train
def sft_mca_train(
training_args: Seq2SeqTrainingArguments,
model_args: ModelArguments,
data_args: DataArguments,
finetuning_args: FinetuningArguments,
) -> None
Runs supervised fine-tuning with MCA. Supports sequence packing, vision-language models (Qwen2-VL with freeze options for vision tower, projector, or language model), and 4D attention masks.
dpo_mca_train
def dpo_mca_train(
training_args: Seq2SeqTrainingArguments,
model_args: ModelArguments,
data_args: DataArguments,
finetuning_args: FinetuningArguments,
) -> None
Runs Direct Preference Optimization training. Optionally creates a reference model by cloning the main model weights. Uses PairwiseDataCollatorWithPadding and DPOTrainer.
mca_train
def mca_train(
training_args: Seq2SeqTrainingArguments,
model_args: ModelArguments,
data_args: DataArguments,
finetuning_args: FinetuningArguments,
) -> None
Dispatcher function that routes to pt_mca_train, sft_mca_train, or dpo_mca_train based on finetuning_args.stage.
main
def main() -> None
Top-level entry point. Calls get_args(), configures model max length and block diagonal attention, then dispatches to either mca_train or llama_factory_train.
Import
import torch
from filelock import FileLock
from huggingface_hub import snapshot_download
from llamafactory.data import get_dataset, get_template_and_fix_tokenizer
from llamafactory.hparams import DataArguments, FinetuningArguments, GeneratingArguments, ModelArguments
from peft import LoraConfig, get_peft_model
from transformers import DataCollatorForSeq2Seq, HfArgumentParser
from mcore_adapter.adapters import apply_megatron_lora, find_all_linear_modules, set_linear_is_expert
from mcore_adapter.models import AutoConfig, AutoModel
from mcore_adapter.trainer import DPOTrainer, McaTrainer
from mcore_adapter.training_args import Seq2SeqTrainingArguments
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_name_or_path | str | Yes | Path to local model directory or HuggingFace model identifier |
| stage | str | Yes | Training stage: pt (pretraining), sft (supervised fine-tuning), or dpo (direct preference optimization) |
| use_mca | bool | No | Whether to use Megatron-Core backend (default: True). Set to False for LLaMA-Factory backend |
| training_args | Seq2SeqTrainingArguments | Yes | Training hyperparameters including parallelism configuration |
| data_args | DataArguments | Yes | Dataset paths and preprocessing configuration |
| finetuning_args | FinetuningArguments | Yes | LoRA rank, alpha, dropout, freeze settings |
Outputs
| Name | Type | Description |
|---|---|---|
| None | None | Trains the model in-place, saving checkpoints to training_args.output_dir |
Usage Examples
# PreTraining with Megatron-Core
# Command line:
# python run_train.py --model_name_or_path Qwen/Qwen2.5-7B --stage pt --use_mca True
# SFT with LoRA
# python run_train.py --model_name_or_path Qwen/Qwen2.5-7B --stage sft \
# --finetuning_type lora --lora_rank 16 --lora_alpha 32
# DPO training
# python run_train.py --model_name_or_path Qwen/Qwen2.5-7B --stage dpo \
# --use_ref_model True --pref_beta 0.1