Implementation:LMCache LMCache Controller Observability
| Knowledge Sources | |
|---|---|
| Domains | Observability, Cache Controller |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Provides Prometheus-based metrics collection and socket message counting for monitoring the LMCache cache controller.
Description
This module defines two classes for cache controller observability. PrometheusLogger is a singleton that initializes and manages Prometheus Gauge metrics for the cache controller, including KV pool key counts, registered worker counts, socket message counts and pending status for both PULL and REPLY sockets, active request counts, sequence number discontinuity counts, and full sync progress metrics. Metrics are created with configurable labels and support livemostrecent multiprocess mode. SocketMetricsContext is a context manager that tracks message counts and active requests for a given socket type, automatically incrementing counters on entry and decrementing active requests on exit, with error logging on exceptions.
Usage
Use PrometheusLogger.GetOrCreate at controller startup to initialize metrics with appropriate labels. Access the singleton later via GetInstance or GetInstanceOrNone. Use SocketMetricsContext around socket message processing loops to track throughput and active request counts per socket type.
Code Reference
Source Location
- Repository: LMCache
- File: lmcache/v1/cache_controller/observability.py
- Lines: 1-208
Signature
class SocketType(Enum):
PULL = "pull"
REPLY = "reply"
class PrometheusLogger:
def __init__(self, labels: dict) -> None: ...
@staticmethod
def GetOrCreate(labels: dict) -> "PrometheusLogger": ...
@staticmethod
def GetInstance() -> "PrometheusLogger": ...
@staticmethod
def GetInstanceOrNone() -> Optional["PrometheusLogger"]: ...
@staticmethod
def DestroyInstance() -> None: ...
@staticmethod
def unregister_all_metrics() -> None: ...
class SocketMetricsContext:
def __init__(self, manager, socket_type: SocketType, message_count: int = 1) -> None: ...
def __enter__(self) -> "SocketMetricsContext": ...
def __exit__(self, exc_type, exc_val, exc_tb) -> bool: ...
Import
from lmcache.v1.cache_controller.observability import (
PrometheusLogger,
SocketType,
SocketMetricsContext,
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| labels | dict | Yes | Dictionary of label key-value pairs for Prometheus metric dimensions |
| manager | object | Yes | Object whose attributes will be updated for socket counting (SocketMetricsContext) |
| socket_type | SocketType | Yes | Which socket's metrics to update (PULL or REPLY) |
| message_count | int | No | Number of messages to count per context entry (default: 1) |
Outputs
| Name | Type | Description |
|---|---|---|
| PrometheusLogger | PrometheusLogger | Singleton instance providing Prometheus gauge metrics for the controller |
| Prometheus Gauges | prometheus_client.Gauge | Individual metrics: kv_pool_keys_count, registered_workers_count, socket message/pending/active metrics, full sync metrics |
Usage Examples
from lmcache.v1.cache_controller.observability import (
PrometheusLogger,
SocketMetricsContext,
SocketType,
)
# Initialize at controller startup
prom = PrometheusLogger.GetOrCreate(labels={"controller_id": "ctrl-01"})
# Set dynamic metric functions
prom.kv_pool_keys_count.set_function(lambda: len(kv_pool))
prom.registered_workers_count.set_function(lambda: worker_count)
# Track socket processing metrics
with SocketMetricsContext(manager_instance, SocketType.PULL):
# Process a PULL socket message
handle_pull_message(msg)