Implementation:Microsoft DeepSpeedExamples Domino Training
| Knowledge Sources | |
|---|---|
| Domains | Distributed Training, Large Language Models |
| Last Updated | 2026-02-07 12:00 GMT |
Overview
Domino pretraining utilities module adapted from Megatron-LM that orchestrates the full distributed training pipeline including model setup, data loading, training loop, evaluation, and checkpointing.
Description
This module provides the complete pretraining orchestration for the DeepSpeed-Domino distributed training system, adapted from Megatron-LM's training.py. The central function pretrain() manages the entire training lifecycle: initializing Megatron, setting up the model and optimizer, building data iterators, running the training loop, and performing evaluation. It integrates with Megatron's pipeline and tensor parallelism infrastructure.
The module includes setup_model_and_optimizer() which builds the model using a user-provided model_builder function, wraps it with the appropriate distributed data parallel wrapper (LocalDDP or torchDDP), initializes the Megatron optimizer, configures the learning rate scheduler, and optionally loads from a checkpoint. The get_model() function handles model construction with support for Float16Module wrapping and distributed data parallel configuration.
The train() function implements the main training loop with CUDA event-based timing, loss logging, and iteration tracking. It calls train_step() for each iteration which handles the forward-backward pass using Megatron's pipeline-parallel forward_backward_func, gradient reduction, and optimizer stepping. Additional utilities include training_log() for TensorBoard logging, evaluate() for running validation, evaluate_and_print_results() for formatted evaluation output, and save_checkpoint_and_time() for timed checkpoint saving.
Usage
Use this module as the main entry point for Domino-accelerated pretraining. Call the pretrain() function with a model builder, dataset builder, and forward step function to launch the full distributed training pipeline. It is designed for large-scale language model pretraining with Megatron-LM parallelism strategies enhanced by DeepSpeed Domino's communication overlap optimization.
Code Reference
Source Location
- Repository: Microsoft_DeepSpeedExamples
- File: training/DeepSpeed-Domino/domino/training.py
- Lines: 1-839
Signature
def pretrain(model_builder, dataset_builder, forward_step_func):
def setup_model_and_optimizer(model_builder, model_type,
no_wd_decay_cond=None,
scale_lr_cond=None, lr_mult=1.0):
def get_model(model_builder, model_type=ModelType.encoder_or_decoder,
wrap_with_ddp=True):
def train(forward_step_func, model, optimizer, opt_param_scheduler,
train_data_iterator, valid_data_iterator, config):
def train_step(forward_step_func, data_iterator, model,
optimizer, opt_param_scheduler, config):
def evaluate(forward_step_func, data_iterator, model, config,
verbose=False):
def evaluate_and_print_results(prefix, forward_step_func,
data_iterator, model, iteration, config,
verbose=False, write_to_tensorboard=False):
Import
from domino.training import pretrain
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model_builder | Callable | Yes | Function that takes pre_process and post_process bools and returns a model |
| dataset_builder | Callable | Yes | Function that takes dataset sizes and returns train/valid/test datasets |
| forward_step_func | Callable | Yes | Function that takes data_iterator and model, returns loss and metrics dict |
| model_type | ModelType | No | Model type enum (default: encoder_or_decoder) |
| config | TransformerConfig | Yes | Model configuration (used in train loop) |
| train_data_iterator | Iterator | Yes | Iterator over training data batches |
| valid_data_iterator | Iterator | No | Iterator over validation data batches |
Outputs
| Name | Type | Description |
|---|---|---|
| model | List[Module] | List of model shards (for pipeline parallelism) |
| optimizer | MegatronOptimizer | The configured optimizer |
| opt_param_scheduler | OptimizerParamScheduler | Learning rate scheduler |
| iteration | int | Final iteration count after training completes |
Usage Examples
from domino.training import pretrain
def model_builder(pre_process, post_process):
return GPTModel(pre_process=pre_process, post_process=post_process)
def dataset_builder(train_val_test_num_samples):
return build_train_valid_test_datasets(train_val_test_num_samples)
def forward_step(data_iterator, model):
batch = next(data_iterator)
loss = model(batch)
return loss, {'lm loss': loss}
# Launch full pretraining pipeline
pretrain(
model_builder=model_builder,
dataset_builder=dataset_builder,
forward_step_func=forward_step
)