Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA TransformerEngine Dropout C API

From Leeroopedia


Field Value
Sources TransformerEngine
Domains Deep_Learning, Optimization
Last Updated 2026-02-07 14:00 GMT

Overview

Declares the C API for forward and backward dropout operations on GPU tensors, using bitwise mask representation for memory efficiency.

Description

dropout.h exposes two extern "C" functions:

  • nvte_dropout_fwd: Generates a random binary mask using the provided RNG state and dropout probability, applies the mask element-wise to the input, and writes both the scaled output and the compact bit-packed mask tensor. Each bit in the mask corresponds to one output element (1 = kept, 0 = dropped).
  • nvte_dropout_bwd: Takes the incoming gradient and the stored mask to compute the input gradient by re-applying the same mask pattern with the inverse dropout probability scaling.

The bit-packed mask representation (one bit per element) reduces memory consumption compared to storing full float masks, which is important for long-sequence training.

Usage

Use for standalone dropout operations. Note that for attention, dropout is typically fused into the fused attention kernel.

Code Reference

Source Location

Repository
NVIDIA/TransformerEngine
File
transformer_engine/common/include/transformer_engine/dropout.h
Lines
1--51

Signature

void nvte_dropout_fwd(const NVTETensor input, NVTETensor output,
                      NVTETensor mask, NVTETensor rng_state,
                      float dropout_probability, cudaStream_t stream);

void nvte_dropout_bwd(const NVTETensor grad_output, const NVTETensor mask,
                      NVTETensor grad_input, float dropout_probability,
                      cudaStream_t stream);

Import

#include "transformer_engine/dropout.h"

I/O Contract

Inputs

Name Type Required Description
input NVTETensor Yes Input tensor
rng_state NVTETensor Yes RNG engine state for reproducible masking
dropout_probability float Yes Probability of dropping each element
stream cudaStream_t Yes CUDA stream

Outputs

Name Type Description
output NVTETensor Scaled output with dropout applied
mask NVTETensor Bit-packed dropout mask

Usage Examples

#include "transformer_engine/dropout.h"

// Forward: apply dropout
nvte_dropout_fwd(input, output, mask, rng_state, 0.1f, stream);

// Backward: apply mask to gradients
nvte_dropout_bwd(grad_output, mask, grad_input, 0.1f, stream);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment