Implementation:Microsoft Onnxruntime TrainingUtil Declarations

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, Models, Utilities
Last Updated	2026-02-10 04:00 GMT

Overview

Declares training utility classes including DataSet, RandomDataSet, TrainingUtil, LossScaler, LearningRateScheduler, and multiple LR schedule implementations (NoWarmup, Cosine, Constant, Linear, Poly).

Description

The `training_util.h` header provides the full declaration surface for the ORT Training model runner utilities:

DataSet: A container for training samples. Each sample is a `unique_ptr<vector<OrtValue>>` (typedef `SampleType`). Supports adding data from raw OrtValues or TensorProtos, batching via `GetKthBatch`, random shuffling, and dimension extraction from input shapes via `GetTensorDimensionsFromInputs`. Includes a `RETURN_IF_FAIL` convenience macro for error handling.

RandomDataSet: Extends DataSet to generate zero-filled random data of specified tensor shapes and types, useful for benchmarking without real data.

TrainingUtil: Provides static template methods `CreateCpuMLValue` and `CreateCpuMLScalar` for constructing CPU-backed OrtValues from vectors and scalars. Also includes `GetCpuAllocator()`, `PrintNameMLValMap`, and `PrintTensor` for debugging.

LearningRateParameters: Configuration struct with `initial_lr`, `warmup_ratio`, `warmup_mode` (schedule type string), and `feed_name`.

LossScaler: Dynamic loss scaling for mixed-precision training with configurable up-scale window, min/max bounds, and checkpoint serialization.

LearningRateScheduler: Abstract base class computing `lr = initial_lr * GetLearningRateFactor(cur_ratio, warmup_ratio)`. Five concrete implementations:

 - `NoWarmpScheduler`: Returns constant factor of 1.0.
 - `CosineScheduler`: Linear warmup then cosine decay: `0.5 * (1 + cos(pi * cur_ratio))`.
 - `ConstantScheduler`: Linear warmup then constant factor of 1.0.
 - `LinearScheduler`: Linear warmup then linear decay to 0.
 - `PolyScheduler`: Linear warmup then polynomial decay: `(1 - cur_ratio)^0.5`.

Usage

Use this header when building custom training runners or when you need DataSet management, CPU tensor creation, loss scaling, or learning rate scheduling for ORT Training model runner applications.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/models/runner/training_util.h
Lines: 1-326

Signature

class DataSet {
 public:
  typedef std::unique_ptr<std::vector<OrtValue>> SampleType;
  DataSet(const std::vector<std::string>& tensor_names);
  virtual ~DataSet();
  size_t NumInputs() const;
  common::Status AddData(SampleType&& single_sample);
  common::Status AddData(const std::vector<ONNX_NAMESPACE::TensorProto>& features);
  virtual size_t NumSamples() const;
  size_t TotalBatch(size_t batch_size) const;
  virtual std::vector<OrtValue> GetKthBatch(size_t batch_size, size_t k_th,
                                            AllocatorPtr allocator = nullptr) const;
  void RandomShuffle();
};

class TrainingUtil {
 public:
  template <typename T>
  static void CreateCpuMLValue(gsl::span<const int64_t> dims,
                               const std::vector<T>& value, OrtValue* p_mlvalue,
                               AllocatorPtr alloc = nullptr);
  template <typename T>
  static void CreateCpuMLScalar(const T value, OrtValue* p_mlvalue,
                                AllocatorPtr alloc = nullptr);
  static AllocatorPtr GetCpuAllocator();
  static void PrintNameMLValMap(const NameMLValMap& mlvalue_map);
  static void PrintTensor(const std::string& name, const Tensor& tensor,
                          std::ostream& os = std::cout);
};

class LossScaler {
 public:
  LossScaler(const std::string loss_scale_input_name, bool is_dynamic_scale,
             float loss_scale = 65536.f, size_t up_scale_window = 2000,
             float min_loss_scale = 1.0f, float max_loss_scale = 16777216.f);
  void UpdateLossScale(bool is_all_finite);
  std::string SaveToString() const;
  Status LoadFromString(const std::string& input);
};

class LearningRateScheduler {
 public:
  float GetLearningRate(const size_t current_step) const;
  virtual float GetLearningRateFactor(float cur_ratio, float warmp_ratio) const = 0;
  static std::unique_ptr<LearningRateScheduler> Create(LearningRateParameters& lr_params,
                                                       size_t training_step_count);
};

Import

#include "orttraining/models/runner/training_util.h"

I/O Contract

Class	Inputs	Outputs	Description
DataSet::GetKthBatch	batch_size, k_th, allocator	vector<OrtValue>	Returns batched tensors for the k-th batch
TrainingUtil::CreateCpuMLValue	dims, values, OrtValue*	void	Creates a CPU tensor OrtValue from a vector
LossScaler::UpdateLossScale	is_all_finite (bool)	void	Dynamically adjusts loss scale based on gradient status
LearningRateScheduler::GetLearningRate	current_step	float	Returns scheduled LR for the given step
CosineScheduler::GetLearningRateFactor	cur_ratio, warmup_ratio	float	Cosine decay: 0.5(1+cos(piratio)) after warmup
PolyScheduler::GetLearningRateFactor	cur_ratio, warmup_ratio	float	Polynomial decay: (1-ratio)^0.5 after warmup

Usage Examples

#include "orttraining/models/runner/training_util.h"

using namespace onnxruntime::training;

// Create a CPU MLValue
OrtValue value;
std::vector<float> data = {1.0f, 2.0f, 3.0f, 4.0f};
TrainingUtil::CreateCpuMLValue<float>({2, 2}, data, &value);

// Create a scalar
OrtValue scalar;
TrainingUtil::CreateCpuMLScalar<float>(0.001f, &scalar);

// LR scheduling
LearningRateParameters params{0.001f, 0.1f, "Cosine", "Learning_Rate"};
auto scheduler = LearningRateScheduler::Create(params, 10000);
float lr = scheduler->GetLearningRate(5000);  // cosine-decayed LR at step 5000

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment