Implementation:InternLM Lmdeploy Core FloatingPoint

Knowledge Sources	InternLM_Lmdeploy
Domains	GPU_Kernels, Quantization
Last Updated	2026-02-07 15:00 GMT

Overview

Compile-time floating-point introspection and conversion utilities for custom low-bit float formats (e.g., E2M1, E3M2, E2M3).

Description

The FloatingPoint<E, M> template struct provides compile-time constants and device-side conversion functions for arbitrary floating-point formats parameterized by exponent bits (E) and mantissa bits (M). It computes format properties at compile time: exponent_bias, max_normal, min_normal, max_denormal, min_denormal, and bit masks. Two device functions handle conversion: from_f32() converts an IEEE 754 float to the custom format with support for both round-to-nearest-even and stochastic rounding, and to_f32() converts back to float using PTX mul.f32 to avoid flush-to-zero. Static assertions verify correctness for E2M1, E3M2, and E2M3 formats.

Usage

Use this utility when implementing quantization or dequantization kernels that operate with custom low-bit floating-point representations such as FP4 (E2M1), FP6, or FP8 formats.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/kernels/core/floating_point.h

Signature

template<int E, int M>
struct FloatingPoint {
    static constexpr unsigned exponent_bits;
    static constexpr unsigned mantissa_bits;
    static constexpr unsigned exponent_bias;
    static constexpr float max_normal;
    static constexpr float min_normal;

    template<class R>
    __device__ static unsigned from_f32(float x, R rbits);

    __device__ static float to_f32(unsigned x);
};

Import

#include "src/turbomind/kernels/core/floating_point.h"

I/O Contract

Inputs

Name	Type	Required	Description
E	int	Yes	Number of exponent bits in the target format
M	int	Yes	Number of mantissa bits in the target format
x (from_f32)	float	Yes	IEEE 754 float value to convert
rbits	unsigned or std::nullptr_t	Yes	Random bits for stochastic rounding (unsigned) or tag for RNE (non-unsigned)
x (to_f32)	unsigned	Yes	Packed custom-format value to convert to float

Outputs

Name	Type	Description
from_f32 return	unsigned	Packed representation in the custom float format
to_f32 return	float	IEEE 754 float equivalent

Usage Examples

using FP4 = turbomind::FloatingPoint<2, 1>;  // E2M1 format

// Convert float to FP4 with round-to-nearest-even
unsigned packed = FP4::from_f32(3.5f, 0);

// Convert FP4 back to float
float val = FP4::to_f32(packed);

// Stochastic rounding
unsigned rbits = curand(&state);
unsigned packed_sr = FP4::from_f32(3.5f, rbits);

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment