Implementation:NVIDIA TransformerEngine Swizzle C API
Appearance
| Field | Value |
|---|---|
| Sources | TransformerEngine |
| Domains | Deep_Learning, Optimization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Declares the C API for swizzling FP8 scaling factors into the interleaved memory layout required by cuBLASLt GEMM kernels.
Description
swizzle.h exposes three extern "C" functions:
- nvte_swizzle_scaling_factors: Converts a single tensor's row-major
scale_invinto the interleaved format. Requirements: scale_inv in row-major, padded to 128x4 (row-scale) or 4x128 (col-scale), quantized along K-dimension. - nvte_multi_tensor_swizzle_scaling_factors: Performs the same operation on multiple tensors in a single kernel launch, reducing launch overhead.
- nvte_swizzle_block_scaling_to_mxfp8_scaling_factors: Converts FP8 block-scaling factors into MXFP8 interleaved layout for emulating block scaling on Blackwell+ architectures where native block scaling is not supported by cuBLASLt.
Without proper swizzling, FP8 GEMM results would be numerically incorrect because the tensor core kernels expect scale factors in a specific interleaved memory pattern.
Usage
Use after quantization and before GEMM execution to transform scaling factors into the layout expected by cuBLASLt.
Code Reference
Source Location
- Repository
NVIDIA/TransformerEngine- File
transformer_engine/common/include/transformer_engine/swizzle.h- Lines
- 1--71
Signature
void nvte_swizzle_scaling_factors(const NVTETensor input, NVTETensor output,
cudaStream_t stream);
void nvte_multi_tensor_swizzle_scaling_factors(const NVTETensor* inputs,
NVTETensor* outputs,
const size_t num_tensors,
cudaStream_t stream);
void nvte_swizzle_block_scaling_to_mxfp8_scaling_factors(
const NVTETensor input, NVTETensor output, cudaStream_t stream);
Import
#include "transformer_engine/swizzle.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
input |
NVTETensor |
Yes | Tensor with non-swizzled scale_inv |
stream |
cudaStream_t |
Yes | CUDA stream |
Outputs
| Name | Type | Description |
|---|---|---|
output |
NVTETensor |
Tensor with swizzled scale_inv for GEMM |
Usage Examples
#include "transformer_engine/swizzle.h"
// Swizzle scaling factors before GEMM
nvte_swizzle_scaling_factors(quantized_tensor, gemm_ready_tensor, stream);
// Multi-tensor version for batch processing
nvte_multi_tensor_swizzle_scaling_factors(inputs, outputs, num_tensors, stream);
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment