Implementation:NVIDIA TransformerEngine Common Header

Field	Value
Sources	TransformerEngine
Domains	Deep_Learning, Optimization
Last Updated	2026-02-07 14:00 GMT

Overview

Core internal header defining the fundamental data structures and type system for TransformerEngine's C++ layer, including SimpleTensor, Tensor, GroupedTensor, and dtype/scaling mode utilities.

Description

common.h is the most fundamental header in TransformerEngine's common library, included by virtually every other C++ source file. It defines:

SimpleTensor: Lightweight wrapper around a data pointer, shape vector, and dtype with numel(), has_data(), and buffer_size_bytes() methods.
Tensor: Extends SimpleTensor with quantization-related fields (amax, scale, scale_inv, columnwise variants), scaling mode tracking, and GEMM-swizzled scales flag.
Scaling mode helpers: Inline functions is_tensor_scaling, is_block_scaling, is_delayed_tensor_scaling, is_mxfp8_scaling, is_nvfp4_scaling for checking scaling modes.
Type traits: Template metaprogramming via TypeInfo, TypeExtrema, is_fp8, is_fp4 for compile-time type information on CUDA numeric types (FP16, BF16, FP8 E4M3/E5M2, FP4).
Utility functions: product() for shape products, get_buffer_size_bytes() for memory calculations.

Usage

Include this header in any C++ file that works with TransformerEngine tensor representations. It is the foundation header upon which all other TE common headers depend.

Code Reference

Source Location

Repository: NVIDIA/TransformerEngine
File: transformer_engine/common/common.h
Lines: 1--912

Signature

namespace transformer_engine {

struct SimpleTensor {
  void *dptr;
  std::vector<size_t> shape;
  DType dtype;
  size_t numel() const;
  bool has_data() const;
  size_t buffer_size_bytes() const;
};

struct Tensor {
  SimpleTensor data;
  SimpleTensor columnwise_data;
  SimpleTensor amax, columnwise_amax;
  SimpleTensor scale, scale_inv, columnwise_scale_inv;
  NVTEScalingMode scaling_mode;
  bool with_gemm_swizzled_scales = false;
};

inline bool is_tensor_scaling(const NVTEScalingMode &mode);
inline bool is_block_scaling(const NVTEScalingMode &mode);
inline bool is_mxfp8_scaling(const NVTEScalingMode &mode);

}  // namespace transformer_engine

Import

#include "common/common.h"

I/O Contract

Inputs

Name	Type	Required	Description
N/A	N/A	N/A	This is a header file defining types and utilities

Outputs

Name	Type	Description
N/A	N/A	Provides fundamental type definitions used throughout the library

Usage Examples

#include "common/common.h"

using namespace transformer_engine;

// Create a simple tensor
SimpleTensor t(data_ptr, {batch_size, hidden_size}, DType::kFloat16);

// Check scaling mode
if (is_mxfp8_scaling(tensor.scaling_mode)) {
    // Handle MXFP8 block scaling
}

Related Pages

Environment:NVIDIA_TransformerEngine_CUDA_Toolkit_Requirements

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment