Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft Onnxruntime CPU GradientControl

From Leeroopedia


Knowledge Sources
Domains Training, CPU_Kernels
Last Updated 2026-02-10 04:00 GMT

Overview

Concrete tool for gradient accumulation and zeroing on CPU in the ONNX Runtime training framework.

Description

This file implements three gradient control kernels:

InPlaceAccumulator: Accumulates gradients in-place by adding a new gradient to an existing buffer. It uses broadcast-aware addition (leveraging the same BroadcastSpanFuncs used by the Add operator). An optional boolean do_update signal controls whether to actually accumulate or just pass through the buffer unchanged.

ZeroGradient: Resets a gradient tensor to all zeros using memset. The output aliases the input for in-place zeroing.

InPlaceAccumulatorV2: An enhanced version that either overwrites the accumulation buffer with a new value (when overwrite is true) or adds to it (when false). It outputs both a boolean flag indicating the update happened and the accumulated buffer value.

Usage

These kernels are used in the training pipeline for gradient accumulation across micro-batches. InPlaceAccumulator adds gradients from each micro-batch to a running total, while ZeroGradient resets the accumulator between optimization steps.

Code Reference

Source Location

Signature

template <typename T>
Status InPlaceAccumulator<T>::Compute(OpKernelContext* context) const;

template <typename T>
Status ZeroGradient<T>::Compute(OpKernelContext* context) const;

template <typename T>
Status InPlaceAccumulatorV2<T>::Compute(OpKernelContext* context) const;

Import

#include "orttraining/orttraining/training_ops/cpu/optimizer/gradient_control.h"

I/O Contract

Inputs (InPlaceAccumulator)

Name Type Required Description
gradient_buffer Tensor(float) Yes Existing accumulated gradients (aliased to output)
gradient Tensor(float) Yes New gradient to accumulate (input 1, broadcast-compatible)
do_update Tensor(bool) No If false, skip accumulation

Outputs (InPlaceAccumulator)

Name Type Description
accumulated_gradient Tensor(float) Updated accumulated gradients (in-place alias)

Inputs (ZeroGradient)

Name Type Required Description
old_gradient Tensor(float) Yes Gradient tensor to zero (shape reference)
reset_signal any Yes Signal input (unused, just for dependency)

Outputs (ZeroGradient)

Name Type Description
zero_gradient Tensor(float) Zeroed gradient tensor (in-place alias)

Usage Examples

ONNX_OPERATOR_KERNEL_EX(
    InPlaceAccumulator, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder()
        .Alias(0, 0)  // accumulate tensors in-place
        .TypeConstraint("T", DataTypeImpl::GetTensorType<float>()),
    InPlaceAccumulator<float>);

ONNX_OPERATOR_KERNEL_EX(
    ZeroGradient, kMSDomain, 1, kCpuExecutionProvider,
    KernelDefBuilder()
        .Alias(0, 0)  // reset gradients in-place
        .TypeConstraint("T1", DataTypeImpl::GetTensorType<float>())
        .TypeConstraint("T2", DataTypeImpl::AllTensorTypes()),
    ZeroGradient<float>);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment