Implementation:Microsoft Onnxruntime CPU GradientControl
| Knowledge Sources | |
|---|---|
| Domains | Training, CPU_Kernels |
| Last Updated | 2026-02-10 04:00 GMT |
Overview
Concrete tool for gradient accumulation and zeroing on CPU in the ONNX Runtime training framework.
Description
This file implements three gradient control kernels:
InPlaceAccumulator: Accumulates gradients in-place by adding a new gradient to an existing buffer. It uses broadcast-aware addition (leveraging the same BroadcastSpanFuncs used by the Add operator). An optional boolean do_update signal controls whether to actually accumulate or just pass through the buffer unchanged.
ZeroGradient: Resets a gradient tensor to all zeros using memset. The output aliases the input for in-place zeroing.
InPlaceAccumulatorV2: An enhanced version that either overwrites the accumulation buffer with a new value (when overwrite is true) or adds to it (when false). It outputs both a boolean flag indicating the update happened and the accumulated buffer value.
Usage
These kernels are used in the training pipeline for gradient accumulation across micro-batches. InPlaceAccumulator adds gradients from each micro-batch to a running total, while ZeroGradient resets the accumulator between optimization steps.
Code Reference
Source Location
- Repository: Microsoft_Onnxruntime
- File: orttraining/orttraining/training_ops/cpu/optimizer/gradient_control.cc
- Lines: 1-132
Signature
template <typename T>
Status InPlaceAccumulator<T>::Compute(OpKernelContext* context) const;
template <typename T>
Status ZeroGradient<T>::Compute(OpKernelContext* context) const;
template <typename T>
Status InPlaceAccumulatorV2<T>::Compute(OpKernelContext* context) const;
Import
#include "orttraining/orttraining/training_ops/cpu/optimizer/gradient_control.h"
I/O Contract
Inputs (InPlaceAccumulator)
| Name | Type | Required | Description |
|---|---|---|---|
| gradient_buffer | Tensor(float) | Yes | Existing accumulated gradients (aliased to output) |
| gradient | Tensor(float) | Yes | New gradient to accumulate (input 1, broadcast-compatible) |
| do_update | Tensor(bool) | No | If false, skip accumulation |
Outputs (InPlaceAccumulator)
| Name | Type | Description |
|---|---|---|
| accumulated_gradient | Tensor(float) | Updated accumulated gradients (in-place alias) |
Inputs (ZeroGradient)
| Name | Type | Required | Description |
|---|---|---|---|
| old_gradient | Tensor(float) | Yes | Gradient tensor to zero (shape reference) |
| reset_signal | any | Yes | Signal input (unused, just for dependency) |
Outputs (ZeroGradient)
| Name | Type | Description |
|---|---|---|
| zero_gradient | Tensor(float) | Zeroed gradient tensor (in-place alias) |
Usage Examples
ONNX_OPERATOR_KERNEL_EX(
InPlaceAccumulator, kMSDomain, 1, kCpuExecutionProvider,
KernelDefBuilder()
.Alias(0, 0) // accumulate tensors in-place
.TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
InPlaceAccumulator<float>);
ONNX_OPERATOR_KERNEL_EX(
ZeroGradient, kMSDomain, 1, kCpuExecutionProvider,
KernelDefBuilder()
.Alias(0, 0) // reset gradients in-place
.TypeConstraint("T1", DataTypeImpl::GetTensorType<float>())
.TypeConstraint("T2", DataTypeImpl::AllTensorTypes()),
ZeroGradient<float>);