Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:InternLM Lmdeploy Metrics

From Leeroopedia
Revision as of 15:15, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/InternLM_Lmdeploy_Metrics.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Infrastructure, Monitoring
Last Updated 2026-02-07 15:00 GMT

Overview

Data structures for tracking scheduler and request-level performance metrics including sequence counts, KV block usage, and request timing.

Description

This header defines two metric structs used for monitoring the TurboMind inference engine. ScheduleMetrics captures a snapshot of the scheduler state: total_seqs (received sequences), active_seqs (currently running), waiting_seqs (queued), and KV cache block counts (total_blocks, active_blocks, cached_blocks, free_blocks). RequestMetrics tracks per-request timing using atomic timestamps: enqueue_time (when the request was received) and scheduled_time (when inference began), both in microseconds since Unix epoch via std::chrono::system_clock. Both structs provide operator<< overloads for human-readable stream output.

Usage

Use ScheduleMetrics for monitoring scheduler health and KV cache utilization. Use RequestMetrics for measuring request latency (time-to-first-token = scheduled_time - enqueue_time).

Code Reference

Source Location

Signature

struct ScheduleMetrics {
    int total_seqs;
    int active_seqs;
    int waiting_seqs;
    int total_blocks;
    int active_blocks;
    int cached_blocks;
    int free_blocks;
};

struct RequestMetrics {
    std::atomic<int64_t> enqueue_time{};
    std::atomic<int64_t> scheduled_time{};

    static int64_t timestamp();  // microseconds since Unix epoch
};

std::ostream& operator<<(std::ostream& os, const ScheduleMetrics& m);
std::ostream& operator<<(std::ostream& os, const RequestMetrics& m);

Import

#include "src/turbomind/utils/metrics.h"

I/O Contract

Inputs

Name Type Required Description
(fields set by scheduler) int / atomic<int64_t> Yes Metric values populated by the scheduler and request handler

Outputs

Name Type Description
ScheduleMetrics fields int Current scheduler state snapshot
RequestMetrics::timestamp() int64_t Current wall-clock time in microseconds
operator<< output std::ostream Human-readable metric string

Usage Examples

using namespace turbomind;

// Record request timing
RequestMetrics metrics;
metrics.enqueue_time.store(RequestMetrics::timestamp());
// ... later when scheduled ...
metrics.scheduled_time.store(RequestMetrics::timestamp());

// Log scheduler state
ScheduleMetrics sched{100, 32, 8, 1024, 512, 256, 256};
std::cout << sched << std::endl;
// Output: ScheduleMetrics { total_seqs=100, active_seqs=32, ... }

// Compute time-to-first-token
int64_t ttft_us = metrics.scheduled_time.load() - metrics.enqueue_time.load();

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment