Implementation:InternLM Lmdeploy Metrics
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Monitoring |
| Last Updated | 2026-02-07 15:00 GMT |
Overview
Data structures for tracking scheduler and request-level performance metrics including sequence counts, KV block usage, and request timing.
Description
This header defines two metric structs used for monitoring the TurboMind inference engine. ScheduleMetrics captures a snapshot of the scheduler state: total_seqs (received sequences), active_seqs (currently running), waiting_seqs (queued), and KV cache block counts (total_blocks, active_blocks, cached_blocks, free_blocks). RequestMetrics tracks per-request timing using atomic timestamps: enqueue_time (when the request was received) and scheduled_time (when inference began), both in microseconds since Unix epoch via std::chrono::system_clock. Both structs provide operator<< overloads for human-readable stream output.
Usage
Use ScheduleMetrics for monitoring scheduler health and KV cache utilization. Use RequestMetrics for measuring request latency (time-to-first-token = scheduled_time - enqueue_time).
Code Reference
Source Location
- Repository: InternLM_Lmdeploy
- File: src/turbomind/utils/metrics.h
Signature
struct ScheduleMetrics {
int total_seqs;
int active_seqs;
int waiting_seqs;
int total_blocks;
int active_blocks;
int cached_blocks;
int free_blocks;
};
struct RequestMetrics {
std::atomic<int64_t> enqueue_time{};
std::atomic<int64_t> scheduled_time{};
static int64_t timestamp(); // microseconds since Unix epoch
};
std::ostream& operator<<(std::ostream& os, const ScheduleMetrics& m);
std::ostream& operator<<(std::ostream& os, const RequestMetrics& m);
Import
#include "src/turbomind/utils/metrics.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (fields set by scheduler) | int / atomic<int64_t> | Yes | Metric values populated by the scheduler and request handler |
Outputs
| Name | Type | Description |
|---|---|---|
| ScheduleMetrics fields | int | Current scheduler state snapshot |
| RequestMetrics::timestamp() | int64_t | Current wall-clock time in microseconds |
| operator<< output | std::ostream | Human-readable metric string |
Usage Examples
using namespace turbomind;
// Record request timing
RequestMetrics metrics;
metrics.enqueue_time.store(RequestMetrics::timestamp());
// ... later when scheduled ...
metrics.scheduled_time.store(RequestMetrics::timestamp());
// Log scheduler state
ScheduleMetrics sched{100, 32, 8, 1024, 512, 256, 256};
std::cout << sched << std::endl;
// Output: ScheduleMetrics { total_seqs=100, active_seqs=32, ... }
// Compute time-to-first-token
int64_t ttft_us = metrics.scheduled_time.load() - metrics.enqueue_time.load();