Implementation:LMCache LMCache Observability
| Knowledge Sources | |
|---|---|
| Domains | Observability, Prometheus Metrics |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
This module implements the complete observability stack for LMCache, including in-memory statistics tracking, Prometheus metric export, and a background stats logging thread.
Description
The observability.py module is the core observability layer for LMCache. It defines several dataclass-based statistics containers (LMCacheStats, LookupRequestStats, RetrieveRequestStats, StoreRequestStats, P2PTransferRequestStats) that capture per-request timing and throughput metrics. The LMCStatsMonitor singleton collects metrics in a thread-safe manner via the @thread_safe decorator and periodically flushes them through get_stats_and_clear(). The PrometheusLogger singleton registers and updates a comprehensive set of Prometheus counters, gauges, and histograms. The LMCacheStatsLogger ties them together in a daemon thread that periodically reads stats and pushes them to Prometheus and the usage context tracker.
Usage
The observability module is automatically initialized when an LMCache engine starts. Use LMCStatsMonitor.GetOrCreate() to access the singleton monitor from any component that needs to record metrics. Use PrometheusLogger.GetOrCreate(metadata) to initialize Prometheus metric export.
Code Reference
Source Location
- Repository: LMCache
- File: lmcache/observability.py
- Lines: 1-1839
Signature
@dataclass
class LMCacheStats:
interval_retrieve_requests: int
interval_store_requests: int
interval_lookup_requests: int
interval_requested_tokens: int
interval_hit_tokens: int
interval_stored_tokens: int
# ... (many more fields)
retrieve_hit_rate: float
lookup_hit_rate: float
time_to_retrieve: List[float]
time_to_store: List[float]
# ...
@dataclass
class LookupRequestStats:
request_id: int
num_tokens: int
hit_tokens: int
is_finished: bool
@dataclass
class RetrieveRequestStats:
request_id: int
num_tokens: int
local_hit_tokens: int
remote_hit_tokens: int
start_time: float
end_time: float
@dataclass
class StoreRequestStats:
request_id: int
num_tokens: int
start_time: float
end_time: float
@dataclass
class P2PTransferRequestStats:
num_tokens: int
start_time: float
end_time: float
class LMCStatsMonitor:
def __init__(self): ...
def on_lookup_request(self, num_tokens: int) -> LookupRequestStats: ...
def on_lookup_finished(self, stats: LookupRequestStats, num_hit_tokens: int): ...
def on_retrieve_request(self, num_tokens: int) -> RetrieveRequestStats: ...
def on_retrieve_finished(self, retrieve_stats: RetrieveRequestStats, num_retrieved_tokens: int): ...
def on_store_request(self, num_tokens: int) -> StoreRequestStats: ...
def on_store_finished(self, store_stats: StoreRequestStats, num_stored_tokens: int = -1): ...
def get_stats_and_clear(self) -> LMCacheStats: ...
@staticmethod
def GetOrCreate() -> "LMCStatsMonitor": ...
class PrometheusLogger:
def __init__(self, metadata: LMCacheMetadata): ...
def log_prometheus(self, stats: LMCacheStats): ...
@staticmethod
def GetOrCreate(metadata: LMCacheMetadata) -> "PrometheusLogger": ...
class LMCacheStatsLogger:
def __init__(self, metadata: LMCacheMetadata, log_interval: int): ...
def log_worker(self): ...
def shutdown(self): ...
Import
from lmcache.observability import (
LMCStatsMonitor,
LMCacheStats,
LookupRequestStats,
RetrieveRequestStats,
StoreRequestStats,
PrometheusLogger,
LMCacheStatsLogger,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| metadata | LMCacheMetadata | Yes (for PrometheusLogger, LMCacheStatsLogger) | Metadata containing model_name, worker_id, role for Prometheus labels |
| log_interval | int | Yes (for LMCacheStatsLogger) | Interval in seconds between stats collection and Prometheus push |
| num_tokens | int | Yes (for on_*_request methods) | Number of tokens involved in the operation being tracked |
| num_hit_tokens / num_retrieved_tokens | int | Yes (for on_*_finished methods) | Number of tokens that were cache hits |
Outputs
| Name | Type | Description |
|---|---|---|
| LMCacheStats | dataclass | Snapshot of all interval metrics, hit rates, timing distributions, and real-time measurements |
| RetrieveRequestStats | dataclass | Per-request timing with context managers for profiling sub-phases (process_tokens, broadcast, to_gpu) |
| StoreRequestStats | dataclass | Per-request timing with context managers for profiling sub-phases (process_tokens, from_gpu, put) |
| Prometheus metrics | Gauges/Counters/Histograms | Over 60 Prometheus metrics covering hit rates, latency distributions, throughput, cache usage, P2P transfers, etc. |
Usage Examples
from lmcache.observability import LMCStatsMonitor
# Get or create the singleton stats monitor
monitor = LMCStatsMonitor.GetOrCreate()
# Record a retrieve request
retrieve_stats = monitor.on_retrieve_request(num_tokens=1024)
# ... perform retrieval ...
# Record completion with actual hit count
monitor.on_retrieve_finished(retrieve_stats, num_retrieved_tokens=768)
# Collect and clear all interval metrics
stats = monitor.get_stats_and_clear()
print(f"Retrieve hit rate: {stats.retrieve_hit_rate:.2%}")
print(f"Mean retrieve time: {sum(stats.time_to_retrieve) / len(stats.time_to_retrieve):.4f}s")