Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:NVIDIA TransformerEngine GEMM Config

From Leeroopedia


Field Value
Sources TransformerEngine
Domains Deep_Learning, Optimization
Last Updated 2026-02-07 14:00 GMT

Overview

Implements the C API for creating, configuring, and destroying GEMM configuration objects, which control GEMM behavior including bias, GELU epilogue fusion, split accumulator usage, and SM count.

Description

gemm/config.cpp provides the opaque configuration API for both standard and grouped GEMM:

  • nvte_create_matmul_config / nvte_destroy_matmul_config: Allocate and free MatmulConfig structs.
  • nvte_get_matmul_config_attribute: Read attributes from config using explicit size checking and memcpy. Uses bool-to-uint8_t conversion for cross-language ABI safety.
  • nvte_set_matmul_config_attribute: Write attributes to config with the same size-safe approach.
  • Grouped variants: nvte_create/get/set/destroy_grouped_matmul_config for batched GEMM with average M/N/K hints.

Supported attributes:

  • kNVTEMatmulConfigBiasTensor: Bias tensor handle
  • kNVTEMatmulConfigDBiasTensor: Backward bias tensor handle
  • kNVTEMatmulConfigWithGELUEpilogue: Fused GELU activation
  • kNVTEMatmulConfigWithDGELUEpilogue: Fused backward GELU
  • kNVTEMatmulConfigUseSplitAccumulator: FP8 split accumulation
  • kNVTEMatmulConfigSMCount: SM count for comm overlap

Usage

Used by framework bindings to configure GEMM operations before execution, supporting fused bias-GELU and configurable SM partitioning.

Code Reference

Source Location

Repository
NVIDIA/TransformerEngine
File
transformer_engine/common/gemm/config.cpp
Lines
1--231

Signature

NVTEMatmulConfig nvte_create_matmul_config();
void nvte_destroy_matmul_config(NVTEMatmulConfig config);
void nvte_get_matmul_config_attribute(NVTEMatmulConfig config,
                                      NVTEMatmulConfigAttribute attr,
                                      void *buf, size_t size_in_bytes,
                                      size_t *size_written);
void nvte_set_matmul_config_attribute(NVTEMatmulConfig config,
                                      NVTEMatmulConfigAttribute attr,
                                      const void *buf, size_t size_in_bytes);

NVTEGroupedMatmulConfig nvte_create_grouped_matmul_config();
void nvte_destroy_grouped_matmul_config(NVTEGroupedMatmulConfig config);

Import

#include <transformer_engine/gemm.h>

I/O Contract

Inputs

Name Type Required Description
config NVTEMatmulConfig Yes Opaque GEMM config handle
attr NVTEMatmulConfigAttribute Yes Attribute to get/set
buf void* Yes Buffer for attribute data
size_in_bytes size_t Yes Buffer size

Outputs

Name Type Description
config NVTEMatmulConfig Created config handle (for create function)
size_written size_t* Actual size of attribute (for get function)

Usage Examples

#include <transformer_engine/gemm.h>

// Create config and enable GELU epilogue
NVTEMatmulConfig config = nvte_create_matmul_config();

uint8_t gelu_flag = 1;
nvte_set_matmul_config_attribute(config,
    kNVTEMatmulConfigWithGELUEpilogue,
    &gelu_flag, sizeof(gelu_flag));

// Use with GEMM
nvte_cublas_gemm_v2(A, B, D, workspace, config, stream);

nvte_destroy_matmul_config(config);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment