Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Microsoft Onnxruntime CPU Training Environment

From Leeroopedia


Field Value
sources setup.py, requirements-training.txt, orttraining/orttraining/training_ops/cpu/
domains training, cpu-kernels, gradients, optimizers
last_updated 2026-02-10

Overview

The CPU-based training environment for executing ONNX Runtime training operator kernels (gradients, optimizers, loss functions) on standard CPUs without GPU acceleration.

Description

The CPU Training Environment provides the runtime context for executing training-specific operator kernels implemented under orttraining/orttraining/training_ops/cpu/. These kernels include gradient computations for activations (GELU, FastGELU), convolutions, pooling, batch normalization, layer normalization, recurrent networks (LSTM, GRU), tensor operations (Gather, Slice, Split, Concat), loss functions (CrossEntropy, SoftmaxCrossEntropyLoss), optimizers (AdamW, SGDv2, SGD/Adam legacy), gradient control operations (accumulation, clipping, scaling), collective communication (MPI Send/Recv), quantization (FakeQuant), and TensorBoard summary operations. The environment requires the onnxruntime-training package variant which includes these additional CPU kernels beyond the standard inference-only package. MPI support is optional and only required for the MpiSend/MpiRecv communication kernels used in distributed training scenarios.

Usage

Use this environment whenever you need to:

  • Execute training operator gradients on CPU (e.g., testing, debugging, or CPU-only training).
  • Run optimizer kernels (AdamW, SGDv2) without GPU acceleration.
  • Compute loss functions (CrossEntropy, SoftmaxCrossEntropyLoss) on CPU.
  • Perform gradient clipping, scaling, or accumulation on CPU tensors.
  • Use MPI-based tensor communication for distributed CPU training.

System Requirements

Requirement Minimum Recommended
Python 3.10 3.12
Operating System Linux (manylinux2014), Windows, macOS Linux x86_64
Architecture x86_64, aarch64 x86_64
RAM 4 GB 16 GB+ (model dependent)
Disk 500 MB (package) 1 GB+ (with training data)

Dependencies

Python Packages

Package Version Purpose
onnxruntime-training >= 1.25.0 Training package variant with CPU training kernels
numpy >= 1.21.6 Tensor I/O and data manipulation
onnx >= 1.12 ONNX model format support
flatbuffers Checkpoint serialization
protobuf Model format support

Optional Dependencies

Package Purpose
mpi4py Required only for MpiSend/MpiRecv distributed communication kernels
h5py Checkpoint I/O in HDF5 format
cerberus Configuration validation for training parameters

Code Evidence

Source: orttraining/orttraining/training_ops/cpu/activation/activations_grad.cc
  - GeluGrad, FastGeluGrad, BiasGeluGrad_dX, BiasFastGeluGrad_dX kernels

Source: orttraining/orttraining/training_ops/cpu/optimizer/adamw/adamw.cc
  - AdamW optimizer kernel on CPU

Source: orttraining/orttraining/training_ops/cpu/optimizer/sgd/sgd.cc
  - SGDOptimizerV2 kernel on CPU

Source: orttraining/orttraining/training_ops/cpu/loss/cross_entropy.cc
  - CrossEntropy loss and gradient on CPU

Source: orttraining/orttraining/training_ops/cpu/loss/softmax_cross_entropy_loss.cc
  - SoftmaxCrossEntropyLoss and gradient on CPU

Source: orttraining/orttraining/training_ops/cpu/communication/send.cc
  - MPI tensor send for distributed training

Source: orttraining/orttraining/training_ops/cpu/communication/recv.cc
  - MPI tensor receive for distributed training

Related Pages

Implementations Using This Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment