Implementation:InternLM Lmdeploy CudaUtils
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, CUDA_Runtime |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
CUDA error-checking macros, device query utilities, debug printing functions, and a RAII device guard for safe multi-GPU programming.
Description
This header is the central CUDA utility header for TurboMind. It provides: error-checking functions and macros (check_cuda_error, sync_check_cuda_error, CUDRVCHECK) that convert CUDA runtime and cuBLAS errors into human-readable messages and abort on failure; assertion macros (FT_CHECK, FT_CHECK_WITH_INFO, FT_THROW) for runtime validation; device query functions (getSMVersion(), getSMCount(), getDeviceName(), getDevice(), getDeviceCount()); debug matrix printing via printMatrix() for various types; a div_up() ceiling-division helper; and CudaDeviceGuard, a RAII class that saves and restores the current CUDA device for safe device switching in multi-GPU scenarios. trim_default_mempool() releases unused memory from the default CUDA memory pool.
Usage
Include this header in any TurboMind source file that calls CUDA APIs. Use check_cuda_error after every CUDA call, CudaDeviceGuard when temporarily switching devices, and the SM query functions for architecture-specific dispatch.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/utils/cuda_utils.h
Signature
#define check_cuda_error(val) check((val), #val, __FILE__, __LINE__)
#define sync_check_cuda_error() syncAndCheck(__FILE__, __LINE__)
#define FT_CHECK(val) myAssert(bool(val), __FILE__, __LINE__)
#define FT_CHECK_WITH_INFO(val, info)
int getSMVersion();
int getSMCount();
std::string getDeviceName();
int getDevice();
int getDeviceCount();
template<class T>
inline T div_up(T a, T n);
class CudaDeviceGuard {
public:
CudaDeviceGuard(int device);
~CudaDeviceGuard();
};
void trim_default_mempool(int device_id);
template<typename T>
void printMatrix(T* ptr, int m, int k, int stride, bool is_device_ptr);
Import
#include "src/turbomind/utils/cuda_utils.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| val (check_cuda_error) | cudaError_t / cublasStatus_t | Yes | CUDA API return value to check |
| device (CudaDeviceGuard) | int | Yes | Target CUDA device ID to switch to |
| ptr (printMatrix) | T* | Yes | Matrix data pointer (host or device) |
| is_device_ptr | bool | Yes | Whether the pointer is on the GPU |
Outputs
| Name | Type | Description |
|---|---|---|
| getSMVersion return | int | GPU SM version (e.g., 80 for Ampere) |
| getSMCount return | int | Number of streaming multiprocessors |
| getDeviceName return | std::string | Human-readable GPU name |
Usage Examples
using namespace turbomind;
// Error-checked CUDA call
check_cuda_error(cudaMalloc(&ptr, size));
// Device guard for multi-GPU
{
CudaDeviceGuard guard(target_device);
// operations on target_device...
} // automatically restores previous device
// Check SM version for dispatch
int sm = getSMVersion();
if (sm >= 80) {
// Use Ampere-optimized path
}
// Debug print a device matrix
printMatrix(dev_ptr, rows, cols, stride, true);