Implementation:FMInference FlexLLMGen DeepSpeed Conversion Utils

Knowledge Sources	FMInference_FlexLLMGen
Domains	CUDA Programming, Numerical Computing, Type Systems
Last Updated	2026-02-09 12:00 GMT

Overview

Header-only C++ template library providing device-inline type conversion functions between all CUDA numeric types, including FP64, FP32, FP16, BF16, and integer types of various widths.

Description

This file defines the conversion namespace containing a comprehensive set of template specializations for the to<TO, FROM>() function template. It provides type-safe, device-inline conversions between every combination of CUDA numeric types: double, float, __half, __nv_bfloat16, int8_t through int64_t, and uint8_t through uint64_t. Vectorized pair types (float2, __half2, __nv_bfloat162) are also supported.

Each conversion uses the appropriate CUDA intrinsic for maximum hardware efficiency. For example, float to __half uses __float2half(), and integer conversions use round-to-nearest intrinsics like __float2int_rn(). Where PTX assembly is available, some conversions use inline PTX for optimal codegen (e.g., float to double via cvt.rn.f64.f32).

Identity conversions (e.g., float to float) are explicitly provided so that templated kernel code can unconditionally call conversion::to<float>(val) regardless of the input type, enabling generic kernel implementations that work across all precision levels.

BF16 support is conditionally compiled under the BF16_AVAILABLE macro, allowing the same code to compile on older CUDA architectures that lack BF16 hardware.

Usage

Include this header in any CUDA kernel that needs to convert between numeric types. It is a foundational utility used throughout the DeepSpeed CUDA kernel library, ensuring all type conversions use correct rounding behavior and optimal instructions.

Code Reference

Source Location

Repository: FMInference_FlexLLMGen
File: benchmark/third_party/DeepSpeed/csrc/includes/conversion_utils.h
Lines: 1-625

Signature

namespace conversion {

// Primary template
template <typename TO, typename FROM>
DS_D_INLINE TO to(FROM val);

// Example specializations:
template <> DS_D_INLINE float to(double val);      // __double2float_rn
template <> DS_D_INLINE float to(__half val);       // __half2float
template <> DS_D_INLINE __half to(float val);       // __float2half
template <> DS_D_INLINE __nv_bfloat16 to(float val); // __float2bfloat16 (BF16_AVAILABLE)
template <> DS_D_INLINE int32_t to(float val);      // __float2int_rn
// ... 60+ specializations total

}  // namespace conversion

Import

#include "conversion_utils.h"

I/O Contract

Inputs

Name	Type	Required	Description
val	FROM (template parameter)	Yes	Numeric value to convert. Supported types: double, float, __half, __nv_bfloat16, int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t, uint32_t, uint64_t, float2, __half2, __nv_bfloat162

Outputs

Name	Type	Description
return value	TO (template parameter)	Converted value in the destination type, using round-to-nearest-even semantics for floating-point conversions

Usage Examples

#include "conversion_utils.h"

// In a CUDA kernel:
__global__ void mixed_precision_kernel(__half* input, float* output, int n) {
    int idx = blockIdx.x * blockDim.x + threadIdx.x;
    if (idx < n) {
        // Promote FP16 to FP32 for computation
        float val = conversion::to<float>(input[idx]);

        // Perform computation in FP32
        val = val * 2.0f + 1.0f;

        output[idx] = val;
    }
}

// Converting between BF16 and FP16 (goes through FP32 intermediate)
__device__ __half bf16_to_fp16(__nv_bfloat16 val) {
    return conversion::to<__half>(val);  // bf16 -> float -> half
}

// Vectorized pair conversion
__device__ __half2 float2_to_half2(float2 val) {
    return conversion::to<__half2>(val);  // __float22half2_rn
}

Related Pages

Principle:FMInference_FlexLLMGen_CUDA_Type_Conversion

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment