Implementation:InternLM Lmdeploy CudaUtils

Knowledge Sources	InternLM_Lmdeploy
Domains	Infrastructure, CUDA_Runtime
Last Updated	2026-02-07 15:00 GMT

Overview

CUDA error-checking macros, device query utilities, debug printing functions, and a RAII device guard for safe multi-GPU programming.

Description

This header is the central CUDA utility header for TurboMind. It provides: error-checking functions and macros (check_cuda_error, sync_check_cuda_error, CUDRVCHECK) that convert CUDA runtime and cuBLAS errors into human-readable messages and abort on failure; assertion macros (FT_CHECK, FT_CHECK_WITH_INFO, FT_THROW) for runtime validation; device query functions (getSMVersion(), getSMCount(), getDeviceName(), getDevice(), getDeviceCount()); debug matrix printing via printMatrix() for various types; a div_up() ceiling-division helper; and CudaDeviceGuard, a RAII class that saves and restores the current CUDA device for safe device switching in multi-GPU scenarios. trim_default_mempool() releases unused memory from the default CUDA memory pool.

Usage

Include this header in any TurboMind source file that calls CUDA APIs. Use check_cuda_error after every CUDA call, CudaDeviceGuard when temporarily switching devices, and the SM query functions for architecture-specific dispatch.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/utils/cuda_utils.h

Signature

#define check_cuda_error(val) check((val), #val, __FILE__, __LINE__)
#define sync_check_cuda_error() syncAndCheck(__FILE__, __LINE__)
#define FT_CHECK(val) myAssert(bool(val), __FILE__, __LINE__)
#define FT_CHECK_WITH_INFO(val, info)

int getSMVersion();
int getSMCount();
std::string getDeviceName();
int getDevice();
int getDeviceCount();

template<class T>
inline T div_up(T a, T n);

class CudaDeviceGuard {
public:
    CudaDeviceGuard(int device);
    ~CudaDeviceGuard();
};

void trim_default_mempool(int device_id);

template<typename T>
void printMatrix(T* ptr, int m, int k, int stride, bool is_device_ptr);

Import

#include "src/turbomind/utils/cuda_utils.h"

I/O Contract

Inputs

Name	Type	Required	Description
val (check_cuda_error)	cudaError_t / cublasStatus_t	Yes	CUDA API return value to check
device (CudaDeviceGuard)	int	Yes	Target CUDA device ID to switch to
ptr (printMatrix)	T*	Yes	Matrix data pointer (host or device)
is_device_ptr	bool	Yes	Whether the pointer is on the GPU

Outputs

Name	Type	Description
getSMVersion return	int	GPU SM version (e.g., 80 for Ampere)
getSMCount return	int	Number of streaming multiprocessors
getDeviceName return	std::string	Human-readable GPU name

Usage Examples

using namespace turbomind;

// Error-checked CUDA call
check_cuda_error(cudaMalloc(&ptr, size));

// Device guard for multi-GPU
{
    CudaDeviceGuard guard(target_device);
    // operations on target_device...
}  // automatically restores previous device

// Check SM version for dispatch
int sm = getSMVersion();
if (sm >= 80) {
    // Use Ampere-optimized path
}

// Debug print a device matrix
printMatrix(dev_ptr, rows, cols, stride, true);

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment