Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:LMCache LMCache Observability

From Leeroopedia
Revision as of 15:25, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/LMCache_LMCache_Observability.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Observability, Prometheus Metrics
Last Updated 2026-02-09 00:00 GMT

Overview

This module implements the complete observability stack for LMCache, including in-memory statistics tracking, Prometheus metric export, and a background stats logging thread.

Description

The observability.py module is the core observability layer for LMCache. It defines several dataclass-based statistics containers (LMCacheStats, LookupRequestStats, RetrieveRequestStats, StoreRequestStats, P2PTransferRequestStats) that capture per-request timing and throughput metrics. The LMCStatsMonitor singleton collects metrics in a thread-safe manner via the @thread_safe decorator and periodically flushes them through get_stats_and_clear(). The PrometheusLogger singleton registers and updates a comprehensive set of Prometheus counters, gauges, and histograms. The LMCacheStatsLogger ties them together in a daemon thread that periodically reads stats and pushes them to Prometheus and the usage context tracker.

Usage

The observability module is automatically initialized when an LMCache engine starts. Use LMCStatsMonitor.GetOrCreate() to access the singleton monitor from any component that needs to record metrics. Use PrometheusLogger.GetOrCreate(metadata) to initialize Prometheus metric export.

Code Reference

Source Location

Signature

@dataclass
class LMCacheStats:
    interval_retrieve_requests: int
    interval_store_requests: int
    interval_lookup_requests: int
    interval_requested_tokens: int
    interval_hit_tokens: int
    interval_stored_tokens: int
    # ... (many more fields)
    retrieve_hit_rate: float
    lookup_hit_rate: float
    time_to_retrieve: List[float]
    time_to_store: List[float]
    # ...

@dataclass
class LookupRequestStats:
    request_id: int
    num_tokens: int
    hit_tokens: int
    is_finished: bool

@dataclass
class RetrieveRequestStats:
    request_id: int
    num_tokens: int
    local_hit_tokens: int
    remote_hit_tokens: int
    start_time: float
    end_time: float

@dataclass
class StoreRequestStats:
    request_id: int
    num_tokens: int
    start_time: float
    end_time: float

@dataclass
class P2PTransferRequestStats:
    num_tokens: int
    start_time: float
    end_time: float

class LMCStatsMonitor:
    def __init__(self): ...
    def on_lookup_request(self, num_tokens: int) -> LookupRequestStats: ...
    def on_lookup_finished(self, stats: LookupRequestStats, num_hit_tokens: int): ...
    def on_retrieve_request(self, num_tokens: int) -> RetrieveRequestStats: ...
    def on_retrieve_finished(self, retrieve_stats: RetrieveRequestStats, num_retrieved_tokens: int): ...
    def on_store_request(self, num_tokens: int) -> StoreRequestStats: ...
    def on_store_finished(self, store_stats: StoreRequestStats, num_stored_tokens: int = -1): ...
    def get_stats_and_clear(self) -> LMCacheStats: ...
    @staticmethod
    def GetOrCreate() -> "LMCStatsMonitor": ...

class PrometheusLogger:
    def __init__(self, metadata: LMCacheMetadata): ...
    def log_prometheus(self, stats: LMCacheStats): ...
    @staticmethod
    def GetOrCreate(metadata: LMCacheMetadata) -> "PrometheusLogger": ...

class LMCacheStatsLogger:
    def __init__(self, metadata: LMCacheMetadata, log_interval: int): ...
    def log_worker(self): ...
    def shutdown(self): ...

Import

from lmcache.observability import (
    LMCStatsMonitor,
    LMCacheStats,
    LookupRequestStats,
    RetrieveRequestStats,
    StoreRequestStats,
    PrometheusLogger,
    LMCacheStatsLogger,
)

I/O Contract

Inputs

Name Type Required Description
metadata LMCacheMetadata Yes (for PrometheusLogger, LMCacheStatsLogger) Metadata containing model_name, worker_id, role for Prometheus labels
log_interval int Yes (for LMCacheStatsLogger) Interval in seconds between stats collection and Prometheus push
num_tokens int Yes (for on_*_request methods) Number of tokens involved in the operation being tracked
num_hit_tokens / num_retrieved_tokens int Yes (for on_*_finished methods) Number of tokens that were cache hits

Outputs

Name Type Description
LMCacheStats dataclass Snapshot of all interval metrics, hit rates, timing distributions, and real-time measurements
RetrieveRequestStats dataclass Per-request timing with context managers for profiling sub-phases (process_tokens, broadcast, to_gpu)
StoreRequestStats dataclass Per-request timing with context managers for profiling sub-phases (process_tokens, from_gpu, put)
Prometheus metrics Gauges/Counters/Histograms Over 60 Prometheus metrics covering hit rates, latency distributions, throughput, cache usage, P2P transfers, etc.

Usage Examples

from lmcache.observability import LMCStatsMonitor

# Get or create the singleton stats monitor
monitor = LMCStatsMonitor.GetOrCreate()

# Record a retrieve request
retrieve_stats = monitor.on_retrieve_request(num_tokens=1024)

# ... perform retrieval ...

# Record completion with actual hit count
monitor.on_retrieve_finished(retrieve_stats, num_retrieved_tokens=768)

# Collect and clear all interval metrics
stats = monitor.get_stats_and_clear()
print(f"Retrieve hit rate: {stats.retrieve_hit_rate:.2%}")
print(f"Mean retrieve time: {sum(stats.time_to_retrieve) / len(stats.time_to_retrieve):.4f}s")

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment