Implementation:Microsoft Onnxruntime CPU GradientControl

Knowledge Sources	Microsoft_Onnxruntime
Domains	Training, CPU_Kernels
Last Updated	2026-02-10 04:00 GMT

Overview

Concrete tool for gradient accumulation and zeroing on CPU in the ONNX Runtime training framework.

Description

This file implements three gradient control kernels:

InPlaceAccumulator: Accumulates gradients in-place by adding a new gradient to an existing buffer. It uses broadcast-aware addition (leveraging the same BroadcastSpanFuncs used by the Add operator). An optional boolean do_update signal controls whether to actually accumulate or just pass through the buffer unchanged.

ZeroGradient: Resets a gradient tensor to all zeros using memset. The output aliases the input for in-place zeroing.

InPlaceAccumulatorV2: An enhanced version that either overwrites the accumulation buffer with a new value (when overwrite is true) or adds to it (when false). It outputs both a boolean flag indicating the update happened and the accumulated buffer value.

Usage

These kernels are used in the training pipeline for gradient accumulation across micro-batches. InPlaceAccumulator adds gradients from each micro-batch to a running total, while ZeroGradient resets the accumulator between optimization steps.

Code Reference

Source Location

Repository: Microsoft_Onnxruntime
File: orttraining/orttraining/training_ops/cpu/optimizer/gradient_control.cc
Lines: 1-132

Signature

template <typename T>
Status InPlaceAccumulator<T>::Compute(OpKernelContext* context) const;

template <typename T>
Status ZeroGradient<T>::Compute(OpKernelContext* context) const;

template <typename T>
Status InPlaceAccumulatorV2<T>::Compute(OpKernelContext* context) const;

Import

#include "orttraining/orttraining/training_ops/cpu/optimizer/gradient_control.h"

I/O Contract

Inputs (InPlaceAccumulator)

Name	Type	Required	Description
gradient_buffer	Tensor(float)	Yes	Existing accumulated gradients (aliased to output)
gradient	Tensor(float)	Yes	New gradient to accumulate (input 1, broadcast-compatible)
do_update	Tensor(bool)	No	If false, skip accumulation

Outputs (InPlaceAccumulator)

Name	Type	Description
accumulated_gradient	Tensor(float)	Updated accumulated gradients (in-place alias)

Inputs (ZeroGradient)

Name	Type	Required	Description
old_gradient	Tensor(float)	Yes	Gradient tensor to zero (shape reference)
reset_signal	any	Yes	Signal input (unused, just for dependency)

Outputs (ZeroGradient)

Name	Type	Description
zero_gradient	Tensor(float)	Zeroed gradient tensor (in-place alias)

Usage Examples

ONNX_OPERATOR_KERNEL_EX(
    InPlaceAccumulator, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder()
        .Alias(0, 0)  // accumulate tensors in-place
        .TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
    InPlaceAccumulator<float>);

ONNX_OPERATOR_KERNEL_EX(
    ZeroGradient, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder()
        .Alias(0, 0)  // reset gradients in-place
        .TypeConstraint("T1", DataTypeImpl::GetTensorType<float>())
        .TypeConstraint("T2", DataTypeImpl::AllTensorTypes()),
    ZeroGradient<float>);

Related Pages

Environment:Microsoft_Onnxruntime_CPU_Training_Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment