Implementation:InternLM Lmdeploy AnomalyHandler
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Debugging |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Singleton utility for detecting and optionally fixing NaN/Inf anomalies in GPU tensors during inference, with configurable severity levels and summarization.
Description
The AnomalyHandler class is a singleton that provides runtime NaN/Inf detection for debugging numerical issues in transformer inference. It is initialized with rank, vocabulary size, a fallback token ID, max batch size, and a CUDA stream. CountAndFix() scans a typed GPU buffer for anomalous values (NaN, Inf) and optionally replaces them, keyed by a string identifier and gated by a severity level. FixLogits() is a specialized variant for logit tensors. Summarize() reports anomaly counts via a callback. Reset() clears accumulated state. The static level() method returns the current anomaly detection level (controlled externally). Convenience macros TM_DEBUG_RAW and TM_DEBUG_TENSOR provide level-gated anomaly checking with zero overhead when the level is below the threshold. The class supports up to 65536 entries.
Usage
Use this handler during development or debugging to detect where NaN/Inf values first appear in the inference pipeline. Enable it by setting the anomaly detection level; at level 0 it is a no-op.
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/utils/anomaly_handler.h
Signature
class AnomalyHandler {
public:
static constexpr size_t max_entries = 65536;
static AnomalyHandler& instance();
static int level() noexcept;
void Init(int rank, int vocab_size, int fallback, int max_batch_size,
cudaStream_t stream) noexcept;
template<class T>
void CountAndFix(T* data, int64_t size, std::string key, int level);
template<class T>
void FixLogits(T* logits, int batch_size, int level);
void Summarize(std::function<void(const int*, int)> handler);
void Reset();
};
// Convenience free function
template<class T>
void count_and_fix(T* data, size_t size, std::string key, int level);
void DebugTensor(Tensor& tensor, const std::string& key, int level);
Import
#include "src/turbomind/utils/anomaly_handler.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data | T* | Yes | GPU buffer to scan for anomalies |
| size | int64_t | Yes | Number of elements in the buffer |
| key | std::string | Yes | Identifier for this check point (e.g., layer name) |
| level | int | Yes | Severity level threshold for this check |
| rank | int | Yes (Init) | GPU/process rank for multi-GPU setups |
Outputs
| Name | Type | Description |
|---|---|---|
| data | T* | Input buffer with anomalous values optionally replaced (in-place) |
| Summarize callback | function | Receives anomaly counts for reporting |
Usage Examples
using namespace turbomind;
// Initialize once
AnomalyHandler::instance().Init(rank, vocab_size, eos_id, max_batch, stream);
// Check hidden states after each layer
TM_DEBUG_TENSOR(hidden_states, "layer_3_output", 1);
// Check raw pointer
TM_DEBUG_RAW(logits_ptr, batch_size * vocab_size, "final_logits", 2);
// Summarize at end of request
AnomalyHandler::instance().Summarize([](const int* counts, int n) {
// report anomaly counts
});