Implementation:NVIDIA TransformerEngine GEMM Config

Field	Value
Sources	TransformerEngine
Domains	Deep_Learning, Optimization
Last Updated	2026-02-07 14:00 GMT

Overview

Implements the C API for creating, configuring, and destroying GEMM configuration objects, which control GEMM behavior including bias, GELU epilogue fusion, split accumulator usage, and SM count.

Description

gemm/config.cpp provides the opaque configuration API for both standard and grouped GEMM:

nvte_create_matmul_config / nvte_destroy_matmul_config: Allocate and free MatmulConfig structs.
nvte_get_matmul_config_attribute: Read attributes from config using explicit size checking and memcpy. Uses bool-to-uint8_t conversion for cross-language ABI safety.
nvte_set_matmul_config_attribute: Write attributes to config with the same size-safe approach.
Grouped variants: nvte_create/get/set/destroy_grouped_matmul_config for batched GEMM with average M/N/K hints.

Supported attributes:

kNVTEMatmulConfigBiasTensor: Bias tensor handle
kNVTEMatmulConfigDBiasTensor: Backward bias tensor handle
kNVTEMatmulConfigWithGELUEpilogue: Fused GELU activation
kNVTEMatmulConfigWithDGELUEpilogue: Fused backward GELU
kNVTEMatmulConfigUseSplitAccumulator: FP8 split accumulation
kNVTEMatmulConfigSMCount: SM count for comm overlap

Usage

Used by framework bindings to configure GEMM operations before execution, supporting fused bias-GELU and configurable SM partitioning.

Code Reference

Source Location

Repository: NVIDIA/TransformerEngine
File: transformer_engine/common/gemm/config.cpp
Lines: 1--231

Signature

NVTEMatmulConfig nvte_create_matmul_config();
void nvte_destroy_matmul_config(NVTEMatmulConfig config);
void nvte_get_matmul_config_attribute(NVTEMatmulConfig config,
                                      NVTEMatmulConfigAttribute attr,
                                      void *buf, size_t size_in_bytes,
                                      size_t *size_written);
void nvte_set_matmul_config_attribute(NVTEMatmulConfig config,
                                      NVTEMatmulConfigAttribute attr,
                                      const void *buf, size_t size_in_bytes);

NVTEGroupedMatmulConfig nvte_create_grouped_matmul_config();
void nvte_destroy_grouped_matmul_config(NVTEGroupedMatmulConfig config);

Import

#include <transformer_engine/gemm.h>

I/O Contract

Inputs

Name	Type	Required	Description
`config`	`NVTEMatmulConfig`	Yes	Opaque GEMM config handle
`attr`	`NVTEMatmulConfigAttribute`	Yes	Attribute to get/set
`buf`	`void*`	Yes	Buffer for attribute data
`size_in_bytes`	`size_t`	Yes	Buffer size

Outputs

Name	Type	Description
`config`	`NVTEMatmulConfig`	Created config handle (for create function)
`size_written`	`size_t*`	Actual size of attribute (for get function)

Usage Examples

#include <transformer_engine/gemm.h>

// Create config and enable GELU epilogue
NVTEMatmulConfig config = nvte_create_matmul_config();

uint8_t gelu_flag = 1;
nvte_set_matmul_config_attribute(config,
    kNVTEMatmulConfigWithGELUEpilogue,
    &gelu_flag, sizeof(gelu_flag));

// Use with GEMM
nvte_cublas_gemm_v2(A, B, D, workspace, config, stream);

nvte_destroy_matmul_config(config);

Related Pages

Environment:NVIDIA_TransformerEngine_CUDA_Toolkit_Requirements

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment