Implementation:LMCache LMCache Observability

Knowledge Sources	LMCache
Domains	Observability, Prometheus Metrics
Last Updated	2026-02-09 00:00 GMT

Overview

This module implements the complete observability stack for LMCache, including in-memory statistics tracking, Prometheus metric export, and a background stats logging thread.

Description

The observability.py module is the core observability layer for LMCache. It defines several dataclass-based statistics containers (LMCacheStats, LookupRequestStats, RetrieveRequestStats, StoreRequestStats, P2PTransferRequestStats) that capture per-request timing and throughput metrics. The LMCStatsMonitor singleton collects metrics in a thread-safe manner via the @thread_safe decorator and periodically flushes them through get_stats_and_clear(). The PrometheusLogger singleton registers and updates a comprehensive set of Prometheus counters, gauges, and histograms. The LMCacheStatsLogger ties them together in a daemon thread that periodically reads stats and pushes them to Prometheus and the usage context tracker.

Usage

The observability module is automatically initialized when an LMCache engine starts. Use LMCStatsMonitor.GetOrCreate() to access the singleton monitor from any component that needs to record metrics. Use PrometheusLogger.GetOrCreate(metadata) to initialize Prometheus metric export.

Code Reference

Source Location

Repository: LMCache
File: lmcache/observability.py
Lines: 1-1839

Signature

@dataclass
class LMCacheStats:
    interval_retrieve_requests: int
    interval_store_requests: int
    interval_lookup_requests: int
    interval_requested_tokens: int
    interval_hit_tokens: int
    interval_stored_tokens: int
    # ... (many more fields)
    retrieve_hit_rate: float
    lookup_hit_rate: float
    time_to_retrieve: List[float]
    time_to_store: List[float]
    # ...

@dataclass
class LookupRequestStats:
    request_id: int
    num_tokens: int
    hit_tokens: int
    is_finished: bool

@dataclass
class RetrieveRequestStats:
    request_id: int
    num_tokens: int
    local_hit_tokens: int
    remote_hit_tokens: int
    start_time: float
    end_time: float

@dataclass
class StoreRequestStats:
    request_id: int
    num_tokens: int
    start_time: float
    end_time: float

@dataclass
class P2PTransferRequestStats:
    num_tokens: int
    start_time: float
    end_time: float

class LMCStatsMonitor:
    def __init__(self): ...
    def on_lookup_request(self, num_tokens: int) -> LookupRequestStats: ...
    def on_lookup_finished(self, stats: LookupRequestStats, num_hit_tokens: int): ...
    def on_retrieve_request(self, num_tokens: int) -> RetrieveRequestStats: ...
    def on_retrieve_finished(self, retrieve_stats: RetrieveRequestStats, num_retrieved_tokens: int): ...
    def on_store_request(self, num_tokens: int) -> StoreRequestStats: ...
    def on_store_finished(self, store_stats: StoreRequestStats, num_stored_tokens: int = -1): ...
    def get_stats_and_clear(self) -> LMCacheStats: ...
    @staticmethod
    def GetOrCreate() -> "LMCStatsMonitor": ...

class PrometheusLogger:
    def __init__(self, metadata: LMCacheMetadata): ...
    def log_prometheus(self, stats: LMCacheStats): ...
    @staticmethod
    def GetOrCreate(metadata: LMCacheMetadata) -> "PrometheusLogger": ...

class LMCacheStatsLogger:
    def __init__(self, metadata: LMCacheMetadata, log_interval: int): ...
    def log_worker(self): ...
    def shutdown(self): ...

Import

from lmcache.observability import (
    LMCStatsMonitor,
    LMCacheStats,
    LookupRequestStats,
    RetrieveRequestStats,
    StoreRequestStats,
    PrometheusLogger,
    LMCacheStatsLogger,
)

I/O Contract

Inputs

Name	Type	Required	Description
metadata	LMCacheMetadata	Yes (for PrometheusLogger, LMCacheStatsLogger)	Metadata containing model_name, worker_id, role for Prometheus labels
log_interval	int	Yes (for LMCacheStatsLogger)	Interval in seconds between stats collection and Prometheus push
num_tokens	int	Yes (for on_*_request methods)	Number of tokens involved in the operation being tracked
num_hit_tokens / num_retrieved_tokens	int	Yes (for on_*_finished methods)	Number of tokens that were cache hits

Outputs

Name	Type	Description
LMCacheStats	dataclass	Snapshot of all interval metrics, hit rates, timing distributions, and real-time measurements
RetrieveRequestStats	dataclass	Per-request timing with context managers for profiling sub-phases (process_tokens, broadcast, to_gpu)
StoreRequestStats	dataclass	Per-request timing with context managers for profiling sub-phases (process_tokens, from_gpu, put)
Prometheus metrics	Gauges/Counters/Histograms	Over 60 Prometheus metrics covering hit rates, latency distributions, throughput, cache usage, P2P transfers, etc.

Usage Examples

from lmcache.observability import LMCStatsMonitor

# Get or create the singleton stats monitor
monitor = LMCStatsMonitor.GetOrCreate()

# Record a retrieve request
retrieve_stats = monitor.on_retrieve_request(num_tokens=1024)

# ... perform retrieval ...

# Record completion with actual hit count
monitor.on_retrieve_finished(retrieve_stats, num_retrieved_tokens=768)

# Collect and clear all interval metrics
stats = monitor.get_stats_and_clear()
print(f"Retrieve hit rate: {stats.retrieve_hit_rate:.2%}")
print(f"Mean retrieve time: {sum(stats.time_to_retrieve) / len(stats.time_to_retrieve):.4f}s")

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment