Implementation:Microsoft DeepSpeedExamples Domino Training

Knowledge Sources	Microsoft_DeepSpeedExamples
Domains	Distributed Training, Large Language Models
Last Updated	2026-02-07 12:00 GMT

Overview

Domino pretraining utilities module adapted from Megatron-LM that orchestrates the full distributed training pipeline including model setup, data loading, training loop, evaluation, and checkpointing.

Description

This module provides the complete pretraining orchestration for the DeepSpeed-Domino distributed training system, adapted from Megatron-LM's training.py. The central function pretrain() manages the entire training lifecycle: initializing Megatron, setting up the model and optimizer, building data iterators, running the training loop, and performing evaluation. It integrates with Megatron's pipeline and tensor parallelism infrastructure.

The module includes setup_model_and_optimizer() which builds the model using a user-provided model_builder function, wraps it with the appropriate distributed data parallel wrapper (LocalDDP or torchDDP), initializes the Megatron optimizer, configures the learning rate scheduler, and optionally loads from a checkpoint. The get_model() function handles model construction with support for Float16Module wrapping and distributed data parallel configuration.

The train() function implements the main training loop with CUDA event-based timing, loss logging, and iteration tracking. It calls train_step() for each iteration which handles the forward-backward pass using Megatron's pipeline-parallel forward_backward_func, gradient reduction, and optimizer stepping. Additional utilities include training_log() for TensorBoard logging, evaluate() for running validation, evaluate_and_print_results() for formatted evaluation output, and save_checkpoint_and_time() for timed checkpoint saving.

Usage

Use this module as the main entry point for Domino-accelerated pretraining. Call the pretrain() function with a model builder, dataset builder, and forward step function to launch the full distributed training pipeline. It is designed for large-scale language model pretraining with Megatron-LM parallelism strategies enhanced by DeepSpeed Domino's communication overlap optimization.

Code Reference

Source Location

Repository: Microsoft_DeepSpeedExamples
File: training/DeepSpeed-Domino/domino/training.py
Lines: 1-839

Signature

def pretrain(model_builder, dataset_builder, forward_step_func):

def setup_model_and_optimizer(model_builder, model_type,
                               no_wd_decay_cond=None,
                               scale_lr_cond=None, lr_mult=1.0):

def get_model(model_builder, model_type=ModelType.encoder_or_decoder,
              wrap_with_ddp=True):

def train(forward_step_func, model, optimizer, opt_param_scheduler,
          train_data_iterator, valid_data_iterator, config):

def train_step(forward_step_func, data_iterator, model,
               optimizer, opt_param_scheduler, config):

def evaluate(forward_step_func, data_iterator, model, config,
             verbose=False):

def evaluate_and_print_results(prefix, forward_step_func,
                               data_iterator, model, iteration, config,
                               verbose=False, write_to_tensorboard=False):

Import

from domino.training import pretrain

I/O Contract

Inputs

Name	Type	Required	Description
model_builder	Callable	Yes	Function that takes pre_process and post_process bools and returns a model
dataset_builder	Callable	Yes	Function that takes dataset sizes and returns train/valid/test datasets
forward_step_func	Callable	Yes	Function that takes data_iterator and model, returns loss and metrics dict
model_type	ModelType	No	Model type enum (default: encoder_or_decoder)
config	TransformerConfig	Yes	Model configuration (used in train loop)
train_data_iterator	Iterator	Yes	Iterator over training data batches
valid_data_iterator	Iterator	No	Iterator over validation data batches

Outputs

Name	Type	Description
model	List[Module]	List of model shards (for pipeline parallelism)
optimizer	MegatronOptimizer	The configured optimizer
opt_param_scheduler	OptimizerParamScheduler	Learning rate scheduler
iteration	int	Final iteration count after training completes

Usage Examples

from domino.training import pretrain

def model_builder(pre_process, post_process):
    return GPTModel(pre_process=pre_process, post_process=post_process)

def dataset_builder(train_val_test_num_samples):
    return build_train_valid_test_datasets(train_val_test_num_samples)

def forward_step(data_iterator, model):
    batch = next(data_iterator)
    loss = model(batch)
    return loss, {'lm loss': loss}

# Launch full pretraining pipeline
pretrain(
    model_builder=model_builder,
    dataset_builder=dataset_builder,
    forward_step_func=forward_step
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment