Implementation:NVIDIA TransformerEngine Padding C API
Appearance
| Field | Value |
|---|---|
| Sources | TransformerEngine |
| Domains | Deep_Learning, Optimization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Declares the C API for batch padding and unpadding multiple 2D tensors along the row dimension, used to align tensor dimensions for efficient GEMM and batch-oriented computation.
Description
padding.h exposes two extern "C" functions:
- nvte_multi_padding: Pads multiple input tensors by adding zero-filled rows at the bottom to reach specified padded row counts. Operates on lists of
NVTETensorpointers with corresponding padded row count arrays. - nvte_multi_unpadding: Removes bottom rows to restore original unpadded dimensions. The reverse operation of padding.
Both functions execute on a single CUDA stream. Only bottom padding mode is supported.
Usage
Use when variable-length sequences need to be padded to uniform dimensions for GEMM or when results need to be unpadded back to original sizes.
Code Reference
Source Location
- Repository
NVIDIA/TransformerEngine- File
transformer_engine/common/include/transformer_engine/padding.h- Lines
- 1--78
Signature
void nvte_multi_padding(size_t num_tensors, const NVTETensor* input_list,
NVTETensor* output_list,
const int* padded_num_rows_list,
cudaStream_t stream);
void nvte_multi_unpadding(size_t num_tensors, const NVTETensor* input_list,
NVTETensor* output_list,
const int* unpadded_num_rows_list,
cudaStream_t stream);
Import
#include "transformer_engine/padding.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
num_tensors |
size_t |
Yes | Number of tensors to process |
input_list |
const NVTETensor* |
Yes | Array of 2D input tensors |
padded_num_rows_list |
const int* |
Yes | Target padded row count per tensor |
stream |
cudaStream_t |
Yes | CUDA stream |
Outputs
| Name | Type | Description |
|---|---|---|
output_list |
NVTETensor* |
Array of padded (or unpadded) output tensors |
Usage Examples
#include "transformer_engine/padding.h"
// Pad 3 tensors to aligned row counts
int padded_rows[] = {128, 256, 128};
nvte_multi_padding(3, input_list, output_list, padded_rows, stream);
// Unpad back to original sizes
int original_rows[] = {100, 200, 115};
nvte_multi_unpadding(3, padded_list, unpadded_list, original_rows, stream);
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment