Implementation:InternLM Lmdeploy Core Math
| Knowledge Sources | |
|---|---|
| Domains | GPU_Kernels, Math_Utilities |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Constexpr math utilities including ceiling division, rounding, integer log2, lowest-bit extraction, and a FastDivMod for branchless integer division on GPU.
Description
This header provides host-and-device constexpr math functions used throughout the TurboMind kernel library. ceil_div() / cdiv() compute ceiling division; round_up() rounds a value up to the nearest multiple; log2() computes integer base-2 logarithm; lowbit() extracts the lowest set bit. The FastDivMod<uint16_t> specialization implements the algorithm from "Division by Invariant Integers using Multiplication" (arXiv:1902.01961) for branchless division and modulo by a runtime divisor using only multiplication and bit shifts, avoiding expensive integer division instructions on the GPU.
Usage
Use ceil_div and round_up for grid/block dimension calculations. Use FastDivMod in inner loops or index calculations where dividing by a runtime-constant value is a performance bottleneck.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/kernels/core/math.h
Signature
template<class T> TM_HOST_DEVICE constexpr T ceil_div(T a, T b);
template<class T> TM_HOST_DEVICE constexpr T cdiv(T a, T b);
template<class T> TM_HOST_DEVICE constexpr T round_up(T a, T b);
template<class T> TM_HOST_DEVICE constexpr T log2(T x);
template<class T> TM_HOST_DEVICE constexpr T lowbit(T x);
template<>
struct FastDivMod<uint16_t> {
TM_HOST_DEVICE constexpr FastDivMod(uint16_t d);
template<class T> TM_HOST_DEVICE friend constexpr uint16_t operator/(T a, FastDivMod b);
template<class T> TM_HOST_DEVICE friend constexpr uint16_t operator%(T a, FastDivMod b);
};
Import
#include "src/turbomind/kernels/core/math.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| a | T | Yes | Dividend or value to round |
| b | T | Yes | Divisor or alignment boundary |
| d | uint16_t | Yes | Divisor for FastDivMod construction |
Outputs
| Name | Type | Description |
|---|---|---|
| ceil_div return | T | Ceiling of a / b |
| round_up return | T | a rounded up to the nearest multiple of b |
| operator/ return | uint16_t | Fast quotient a / d |
| operator% return | uint16_t | Fast remainder a % d |
Usage Examples
using namespace turbomind;
// Grid dimension calculation
int grid_x = ceil_div(num_tokens, block_size);
// Fast division in a kernel
FastDivMod<uint16_t> fast_div(head_dim);
uint16_t q = idx / fast_div;
uint16_t r = idx % fast_div;