Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy CudaUtils

From Leeroopedia


Knowledge Sources
Domains Infrastructure, CUDA_Runtime
Last Updated 2026-02-07 15:00 GMT

Overview

CUDA error-checking macros, device query utilities, debug printing functions, and a RAII device guard for safe multi-GPU programming.

Description

This header is the central CUDA utility header for TurboMind. It provides: error-checking functions and macros (check_cuda_error, sync_check_cuda_error, CUDRVCHECK) that convert CUDA runtime and cuBLAS errors into human-readable messages and abort on failure; assertion macros (FT_CHECK, FT_CHECK_WITH_INFO, FT_THROW) for runtime validation; device query functions (getSMVersion(), getSMCount(), getDeviceName(), getDevice(), getDeviceCount()); debug matrix printing via printMatrix() for various types; a div_up() ceiling-division helper; and CudaDeviceGuard, a RAII class that saves and restores the current CUDA device for safe device switching in multi-GPU scenarios. trim_default_mempool() releases unused memory from the default CUDA memory pool.

Usage

Include this header in any TurboMind source file that calls CUDA APIs. Use check_cuda_error after every CUDA call, CudaDeviceGuard when temporarily switching devices, and the SM query functions for architecture-specific dispatch.

Code Reference

Source Location

Signature

#define check_cuda_error(val) check((val), #val, __FILE__, __LINE__)
#define sync_check_cuda_error() syncAndCheck(__FILE__, __LINE__)
#define FT_CHECK(val) myAssert(bool(val), __FILE__, __LINE__)
#define FT_CHECK_WITH_INFO(val, info)

int getSMVersion();
int getSMCount();
std::string getDeviceName();
int getDevice();
int getDeviceCount();

template<class T>
inline T div_up(T a, T n);

class CudaDeviceGuard {
public:
    CudaDeviceGuard(int device);
    ~CudaDeviceGuard();
};

void trim_default_mempool(int device_id);

template<typename T>
void printMatrix(T* ptr, int m, int k, int stride, bool is_device_ptr);

Import

#include "src/turbomind/utils/cuda_utils.h"

I/O Contract

Inputs

Name Type Required Description
val (check_cuda_error) cudaError_t / cublasStatus_t Yes CUDA API return value to check
device (CudaDeviceGuard) int Yes Target CUDA device ID to switch to
ptr (printMatrix) T* Yes Matrix data pointer (host or device)
is_device_ptr bool Yes Whether the pointer is on the GPU

Outputs

Name Type Description
getSMVersion return int GPU SM version (e.g., 80 for Ampere)
getSMCount return int Number of streaming multiprocessors
getDeviceName return std::string Human-readable GPU name

Usage Examples

using namespace turbomind;

// Error-checked CUDA call
check_cuda_error(cudaMalloc(&ptr, size));

// Device guard for multi-GPU
{
    CudaDeviceGuard guard(target_device);
    // operations on target_device...
}  // automatically restores previous device

// Check SM version for dispatch
int sm = getSMVersion();
if (sm >= 80) {
    // Use Ampere-optimized path
}

// Debug print a device matrix
printMatrix(dev_ptr, rows, cols, stride, true);

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment