Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Microsoft Onnxruntime CUDA BatchScale

From Leeroopedia


Knowledge Sources
Domains Training, CUDA_Kernels
Last Updated 2026-02-10 04:00 GMT

Overview

Concrete tool for scaling a tensor by multiple scale factors simultaneously in the ONNX Runtime CUDA training framework.

Description

Implements the BatchScale operator for CUDA that produces 2 or 3 scaled copies of a single input tensor in one kernel launch. The operator takes a single input tensor and produces multiple output tensors, each scaled by a different float factor (scale0_, scale1_, and optionally scale2_). The BatchScaleFunctor template dispatches to BatchScaleImpl which handles the type-specific scaling on GPU. This avoids multiple separate scale operations when the same tensor needs to be scaled differently for various consumers. Supports MLFloat16, float, double, and BFloat16.

Usage

Used during training when a single tensor needs to be distributed to multiple consumers with different scaling factors, such as in gradient scaling or loss weighting scenarios.

Code Reference

Source Location

Signature

class BatchScale : public CudaKernel {
  Status ComputeInternal(OpKernelContext* context) const;
};

Import

#include "orttraining/training_ops/cuda/math/batch_scale.h"

I/O Contract

Inputs

Name Type Required Description
input Tensor(T) Yes Input tensor to scale

Outputs

Name Type Description
output_0 Tensor(T) Input scaled by scale0_
output_1 Tensor(T) Input scaled by scale1_
output_2 Tensor(T) Input scaled by scale2_ (optional, only if scale2_ is set)

Usage Examples

ONNX_OPERATOR_KERNEL_EX(
    BatchScale, kMSDomain, 1, kCudaExecutionProvider,
    (*KernelDefBuilder::Create())
        .TypeConstraint("T", BuildKernelDefConstraints<MLFloat16, float, double, BFloat16>()),
    BatchScale);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment