Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Core Math

From Leeroopedia
Revision as of 15:14, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/InternLM_Lmdeploy_Core_Math.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains GPU_Kernels, Math_Utilities
Last Updated 2026-02-07 15:00 GMT

Overview

Constexpr math utilities including ceiling division, rounding, integer log2, lowest-bit extraction, and a FastDivMod for branchless integer division on GPU.

Description

This header provides host-and-device constexpr math functions used throughout the TurboMind kernel library. ceil_div() / cdiv() compute ceiling division; round_up() rounds a value up to the nearest multiple; log2() computes integer base-2 logarithm; lowbit() extracts the lowest set bit. The FastDivMod<uint16_t> specialization implements the algorithm from "Division by Invariant Integers using Multiplication" (arXiv:1902.01961) for branchless division and modulo by a runtime divisor using only multiplication and bit shifts, avoiding expensive integer division instructions on the GPU.

Usage

Use ceil_div and round_up for grid/block dimension calculations. Use FastDivMod in inner loops or index calculations where dividing by a runtime-constant value is a performance bottleneck.

Code Reference

Source Location

Signature

template<class T> TM_HOST_DEVICE constexpr T ceil_div(T a, T b);
template<class T> TM_HOST_DEVICE constexpr T cdiv(T a, T b);
template<class T> TM_HOST_DEVICE constexpr T round_up(T a, T b);
template<class T> TM_HOST_DEVICE constexpr T log2(T x);
template<class T> TM_HOST_DEVICE constexpr T lowbit(T x);

template<>
struct FastDivMod<uint16_t> {
    TM_HOST_DEVICE constexpr FastDivMod(uint16_t d);
    template<class T> TM_HOST_DEVICE friend constexpr uint16_t operator/(T a, FastDivMod b);
    template<class T> TM_HOST_DEVICE friend constexpr uint16_t operator%(T a, FastDivMod b);
};

Import

#include "src/turbomind/kernels/core/math.h"

I/O Contract

Inputs

Name Type Required Description
a T Yes Dividend or value to round
b T Yes Divisor or alignment boundary
d uint16_t Yes Divisor for FastDivMod construction

Outputs

Name Type Description
ceil_div return T Ceiling of a / b
round_up return T a rounded up to the nearest multiple of b
operator/ return uint16_t Fast quotient a / d
operator% return uint16_t Fast remainder a % d

Usage Examples

using namespace turbomind;

// Grid dimension calculation
int grid_x = ceil_div(num_tokens, block_size);

// Fast division in a kernel
FastDivMod<uint16_t> fast_div(head_dim);
uint16_t q = idx / fast_div;
uint16_t r = idx % fast_div;

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment