Implementation:NVIDIA TransformerEngine Dropout C API
Appearance
| Field | Value |
|---|---|
| Sources | TransformerEngine |
| Domains | Deep_Learning, Optimization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Declares the C API for forward and backward dropout operations on GPU tensors, using bitwise mask representation for memory efficiency.
Description
dropout.h exposes two extern "C" functions:
- nvte_dropout_fwd: Generates a random binary mask using the provided RNG state and dropout probability, applies the mask element-wise to the input, and writes both the scaled output and the compact bit-packed mask tensor. Each bit in the mask corresponds to one output element (1 = kept, 0 = dropped).
- nvte_dropout_bwd: Takes the incoming gradient and the stored mask to compute the input gradient by re-applying the same mask pattern with the inverse dropout probability scaling.
The bit-packed mask representation (one bit per element) reduces memory consumption compared to storing full float masks, which is important for long-sequence training.
Usage
Use for standalone dropout operations. Note that for attention, dropout is typically fused into the fused attention kernel.
Code Reference
Source Location
- Repository
NVIDIA/TransformerEngine- File
transformer_engine/common/include/transformer_engine/dropout.h- Lines
- 1--51
Signature
void nvte_dropout_fwd(const NVTETensor input, NVTETensor output,
NVTETensor mask, NVTETensor rng_state,
float dropout_probability, cudaStream_t stream);
void nvte_dropout_bwd(const NVTETensor grad_output, const NVTETensor mask,
NVTETensor grad_input, float dropout_probability,
cudaStream_t stream);
Import
#include "transformer_engine/dropout.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
input |
NVTETensor |
Yes | Input tensor |
rng_state |
NVTETensor |
Yes | RNG engine state for reproducible masking |
dropout_probability |
float |
Yes | Probability of dropping each element |
stream |
cudaStream_t |
Yes | CUDA stream |
Outputs
| Name | Type | Description |
|---|---|---|
output |
NVTETensor |
Scaled output with dropout applied |
mask |
NVTETensor |
Bit-packed dropout mask |
Usage Examples
#include "transformer_engine/dropout.h"
// Forward: apply dropout
nvte_dropout_fwd(input, output, mask, rng_state, 0.1f, stream);
// Backward: apply mask to gradients
nvte_dropout_bwd(grad_output, mask, grad_input, 0.1f, stream);
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment