Implementation:NVIDIA TransformerEngine GEMM Config
Appearance
| Field | Value |
|---|---|
| Sources | TransformerEngine |
| Domains | Deep_Learning, Optimization |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
Implements the C API for creating, configuring, and destroying GEMM configuration objects, which control GEMM behavior including bias, GELU epilogue fusion, split accumulator usage, and SM count.
Description
gemm/config.cpp provides the opaque configuration API for both standard and grouped GEMM:
- nvte_create_matmul_config / nvte_destroy_matmul_config: Allocate and free
MatmulConfigstructs. - nvte_get_matmul_config_attribute: Read attributes from config using explicit size checking and
memcpy. Usesbool-to-uint8_tconversion for cross-language ABI safety. - nvte_set_matmul_config_attribute: Write attributes to config with the same size-safe approach.
- Grouped variants:
nvte_create/get/set/destroy_grouped_matmul_configfor batched GEMM with average M/N/K hints.
Supported attributes:
kNVTEMatmulConfigBiasTensor: Bias tensor handlekNVTEMatmulConfigDBiasTensor: Backward bias tensor handlekNVTEMatmulConfigWithGELUEpilogue: Fused GELU activationkNVTEMatmulConfigWithDGELUEpilogue: Fused backward GELUkNVTEMatmulConfigUseSplitAccumulator: FP8 split accumulationkNVTEMatmulConfigSMCount: SM count for comm overlap
Usage
Used by framework bindings to configure GEMM operations before execution, supporting fused bias-GELU and configurable SM partitioning.
Code Reference
Source Location
- Repository
NVIDIA/TransformerEngine- File
transformer_engine/common/gemm/config.cpp- Lines
- 1--231
Signature
NVTEMatmulConfig nvte_create_matmul_config();
void nvte_destroy_matmul_config(NVTEMatmulConfig config);
void nvte_get_matmul_config_attribute(NVTEMatmulConfig config,
NVTEMatmulConfigAttribute attr,
void *buf, size_t size_in_bytes,
size_t *size_written);
void nvte_set_matmul_config_attribute(NVTEMatmulConfig config,
NVTEMatmulConfigAttribute attr,
const void *buf, size_t size_in_bytes);
NVTEGroupedMatmulConfig nvte_create_grouped_matmul_config();
void nvte_destroy_grouped_matmul_config(NVTEGroupedMatmulConfig config);
Import
#include <transformer_engine/gemm.h>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
config |
NVTEMatmulConfig |
Yes | Opaque GEMM config handle |
attr |
NVTEMatmulConfigAttribute |
Yes | Attribute to get/set |
buf |
void* |
Yes | Buffer for attribute data |
size_in_bytes |
size_t |
Yes | Buffer size |
Outputs
| Name | Type | Description |
|---|---|---|
config |
NVTEMatmulConfig |
Created config handle (for create function) |
size_written |
size_t* |
Actual size of attribute (for get function) |
Usage Examples
#include <transformer_engine/gemm.h>
// Create config and enable GELU epilogue
NVTEMatmulConfig config = nvte_create_matmul_config();
uint8_t gelu_flag = 1;
nvte_set_matmul_config_attribute(config,
kNVTEMatmulConfigWithGELUEpilogue,
&gelu_flag, sizeof(gelu_flag));
// Use with GEMM
nvte_cublas_gemm_v2(A, B, D, workspace, config, stream);
nvte_destroy_matmul_config(config);
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment