Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:LMCache LMCache Health Monitor Thresholds

From Leeroopedia



Knowledge Sources
Domains Reliability, Monitoring, Production
Last Updated 2026-02-09 00:00 GMT

Overview

Production health monitoring thresholds including 95% memory usage limit, 30-second ping intervals, 5-minute recovery wait, and two fallback policies (recompute vs. local CPU) for graceful degradation.

Description

LMCache's health monitor framework uses a set of tuned thresholds to detect unhealthy backends and trigger fallback behavior. The system pings remote backends every 30 seconds with a 5-second timeout. If a backend exceeds 95% memory usage, it is marked unhealthy. After 10 consecutive `get_blocking` failures, the system triggers fallback. Recovery is attempted after a 5-minute waiting period. Two fallback policies are available: `RECOMPUTE` (skip all cache operations entirely) and `LOCAL_CPU` (fall back to local CPU backend only). The retrieve operation also monitors performance, logging warnings when a single retrieve exceeds 1 second.

Usage

Tune these thresholds when deploying LMCache in production with remote backends (Redis, Infinistore, etc.). Lower `ping_interval` for faster failure detection at the cost of more network traffic. Increase `waiting_time_for_recovery` in environments where backend restarts are slow. The `LOCAL_CPU` fallback policy is preferable when local CPU cache is populated and network issues are transient; `RECOMPUTE` is safer when the entire cache infrastructure is unreliable.

The Insight (Rule of Thumb)

  • Ping interval: 30 seconds. Balances detection speed vs. network overhead.
  • Ping timeout: 5 seconds. Generous enough for loaded backends but catches true failures.
  • Memory threshold: 95%. Triggers unhealthy status before OOM kills the backend.
  • Failure threshold: 10 consecutive `get_blocking` failures before fallback.
  • Recovery wait: 300 seconds (5 minutes). Allows backend restarts without premature re-probing.
  • Retrieve warning: 1 second. Logs slow retrieves that may indicate backend degradation.
  • Fallback policy: `RECOMPUTE` (default). Safest option, falls back to full recomputation.

Reasoning

The 30-second ping interval represents roughly 1% overhead on a typical request-response cycle while ensuring failures are detected within a minute. The 95% memory threshold provides a 5% buffer before Linux OOM killer intervenes. The 10-failure threshold prevents false positives from transient network hiccups while still catching sustained failures quickly (10 failures at 30s interval = ~5 min detection). The 5-minute recovery wait aligns with typical container restart times in Kubernetes deployments.

Code Evidence

Health monitor constants from `lmcache/v1/health_monitor/constants.py:28-36`:

DEFAULT_PING_TIMEOUT = 5.0
DEFAULT_PING_INTERVAL = 30.0
DEFAULT_FALLBACK_POLICY = FallbackPolicy.RECOMPUTE
DEFAULT_GET_BLOCKING_FAILED_THRESHOLD = 10
DEFAULT_WAITING_TIME_FOR_RECOVERY = 300.0

# Memory thresholds
DEFAULT_MEMORY_THRESHOLD_PERCENT = 95.0  # Unhealthy if memory usage > 95%

Fallback policy enum from `lmcache/v1/health_monitor/constants.py:10-14`:

class FallbackPolicy(str, Enum):
    RECOMPUTE = "recompute"  # Skip all cache operations, fall back to recomputation
    LOCAL_CPU = "local_cpu"  # Fall back to local CPU backend only

Performance monitoring threshold from `lmcache/observability.py:304-305`:

self.retrieve_time_threshold: float = 1e9   # 1 billion ns (1 second)
self.retrieve_token_speed_threshold: float = -1.0  # Disabled by default

ZMQ socket timeouts from `lmcache/v1/rpc_utils.py:18-20`:

DEFAULT_SOCKET_RECV_TIMEOUT_MS = 30000  # 30 seconds
DEFAULT_SOCKET_SEND_TIMEOUT_MS = 10000  # 10 seconds

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment