Implementation:InternLM Lmdeploy AnomalyHandler

Knowledge Sources	InternLM_Lmdeploy
Domains	Infrastructure, Debugging
Last Updated	2026-02-07 15:00 GMT

Overview

Singleton utility for detecting and optionally fixing NaN/Inf anomalies in GPU tensors during inference, with configurable severity levels and summarization.

Description

The AnomalyHandler class is a singleton that provides runtime NaN/Inf detection for debugging numerical issues in transformer inference. It is initialized with rank, vocabulary size, a fallback token ID, max batch size, and a CUDA stream. CountAndFix() scans a typed GPU buffer for anomalous values (NaN, Inf) and optionally replaces them, keyed by a string identifier and gated by a severity level. FixLogits() is a specialized variant for logit tensors. Summarize() reports anomaly counts via a callback. Reset() clears accumulated state. The static level() method returns the current anomaly detection level (controlled externally). Convenience macros TM_DEBUG_RAW and TM_DEBUG_TENSOR provide level-gated anomaly checking with zero overhead when the level is below the threshold. The class supports up to 65536 entries.

Usage

Use this handler during development or debugging to detect where NaN/Inf values first appear in the inference pipeline. Enable it by setting the anomaly detection level; at level 0 it is a no-op.

Code Reference

Source Location

Repository: InternLM_Lmdeploy
File: src/turbomind/utils/anomaly_handler.h

Signature

class AnomalyHandler {
public:
    static constexpr size_t max_entries = 65536;

    static AnomalyHandler& instance();
    static int level() noexcept;

    void Init(int rank, int vocab_size, int fallback, int max_batch_size,
              cudaStream_t stream) noexcept;

    template<class T>
    void CountAndFix(T* data, int64_t size, std::string key, int level);

    template<class T>
    void FixLogits(T* logits, int batch_size, int level);

    void Summarize(std::function<void(const int*, int)> handler);
    void Reset();
};

// Convenience free function
template<class T>
void count_and_fix(T* data, size_t size, std::string key, int level);

void DebugTensor(Tensor& tensor, const std::string& key, int level);

Import

#include "src/turbomind/utils/anomaly_handler.h"

I/O Contract

Inputs

Name	Type	Required	Description
data	T*	Yes	GPU buffer to scan for anomalies
size	int64_t	Yes	Number of elements in the buffer
key	std::string	Yes	Identifier for this check point (e.g., layer name)
level	int	Yes	Severity level threshold for this check
rank	int	Yes (Init)	GPU/process rank for multi-GPU setups

Outputs

Name	Type	Description
data	T*	Input buffer with anomalous values optionally replaced (in-place)
Summarize callback	function	Receives anomaly counts for reporting

Usage Examples

using namespace turbomind;

// Initialize once
AnomalyHandler::instance().Init(rank, vocab_size, eos_id, max_batch, stream);

// Check hidden states after each layer
TM_DEBUG_TENSOR(hidden_states, "layer_3_output", 1);

// Check raw pointer
TM_DEBUG_RAW(logits_ptr, batch_size * vocab_size, "final_logits", 2);

// Summarize at end of request
AnomalyHandler::instance().Summarize([](const int* counts, int n) {
    // report anomaly counts
});

Related Pages

Environment:InternLM_Lmdeploy_CUDA_GPU_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment