Implementation:FMInference FlexLLMGen DeepSpeed Conversion Utils
| Knowledge Sources | |
|---|---|
| Domains | CUDA Programming, Numerical Computing, Type Systems |
| Last Updated | 2026-02-09 12:00 GMT |
Overview
Header-only C++ template library providing device-inline type conversion functions between all CUDA numeric types, including FP64, FP32, FP16, BF16, and integer types of various widths.
Description
This file defines the conversion namespace containing a comprehensive set of template specializations for the to<TO, FROM>() function template. It provides type-safe, device-inline conversions between every combination of CUDA numeric types: double, float, __half, __nv_bfloat16, int8_t through int64_t, and uint8_t through uint64_t. Vectorized pair types (float2, __half2, __nv_bfloat162) are also supported.
Each conversion uses the appropriate CUDA intrinsic for maximum hardware efficiency. For example, float to __half uses __float2half(), and integer conversions use round-to-nearest intrinsics like __float2int_rn(). Where PTX assembly is available, some conversions use inline PTX for optimal codegen (e.g., float to double via cvt.rn.f64.f32).
Identity conversions (e.g., float to float) are explicitly provided so that templated kernel code can unconditionally call conversion::to<float>(val) regardless of the input type, enabling generic kernel implementations that work across all precision levels.
BF16 support is conditionally compiled under the BF16_AVAILABLE macro, allowing the same code to compile on older CUDA architectures that lack BF16 hardware.
Usage
Include this header in any CUDA kernel that needs to convert between numeric types. It is a foundational utility used throughout the DeepSpeed CUDA kernel library, ensuring all type conversions use correct rounding behavior and optimal instructions.
Code Reference
Source Location
- Repository: FMInference_FlexLLMGen
- File: benchmark/third_party/DeepSpeed/csrc/includes/conversion_utils.h
- Lines: 1-625
Signature
namespace conversion {
// Primary template
template <typename TO, typename FROM>
DS_D_INLINE TO to(FROM val);
// Example specializations:
template <> DS_D_INLINE float to(double val); // __double2float_rn
template <> DS_D_INLINE float to(__half val); // __half2float
template <> DS_D_INLINE __half to(float val); // __float2half
template <> DS_D_INLINE __nv_bfloat16 to(float val); // __float2bfloat16 (BF16_AVAILABLE)
template <> DS_D_INLINE int32_t to(float val); // __float2int_rn
// ... 60+ specializations total
} // namespace conversion
Import
#include "conversion_utils.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| val | FROM (template parameter) | Yes | Numeric value to convert. Supported types: double, float, __half, __nv_bfloat16, int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t, uint32_t, uint64_t, float2, __half2, __nv_bfloat162 |
Outputs
| Name | Type | Description |
|---|---|---|
| return value | TO (template parameter) | Converted value in the destination type, using round-to-nearest-even semantics for floating-point conversions |
Usage Examples
#include "conversion_utils.h"
// In a CUDA kernel:
__global__ void mixed_precision_kernel(__half* input, float* output, int n) {
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < n) {
// Promote FP16 to FP32 for computation
float val = conversion::to<float>(input[idx]);
// Perform computation in FP32
val = val * 2.0f + 1.0f;
output[idx] = val;
}
}
// Converting between BF16 and FP16 (goes through FP32 intermediate)
__device__ __half bf16_to_fp16(__nv_bfloat16 val) {
return conversion::to<__half>(val); // bf16 -> float -> half
}
// Vectorized pair conversion
__device__ __half2 float2_to_half2(float2 val) {
return conversion::to<__half2>(val); // __float22half2_rn
}