Implementation:Predibase Lorax Health Check Generation
| Knowledge Sources | |
|---|---|
| Domains | Observability, Model_Serving |
| Last Updated | 2026-02-08 02:00 GMT |
Overview
Concrete tool for verifying inference server liveness provided by the LoRAX Health struct in the Rust router.
Description
The Health struct in router/src/health.rs implements end-to-end health verification. Its check_generation() method constructs a minimal inference batch (a single "liveness" token request), sends it through the gRPC client to all shards for prefill and decode, and returns true only if the full pipeline succeeds. It also checks an atomic boolean flag that tracks inference loop health.
Usage
Used internally by the router's /health HTTP endpoint. Called periodically by Kubernetes liveness/readiness probes and during server startup to confirm the model is loaded and functional.
Code Reference
Source Location
- Repository: LoRAX
- File: router/src/health.rs
- Lines: 14-179
Signature
pub(crate) struct Health {
client: ShardedClient,
inference_health: Arc<AtomicBool>,
shard_info: ShardInfo,
}
impl Health {
pub(crate) fn new(
client: ShardedClient,
inference_health: Arc<AtomicBool>,
shard_info: ShardInfo,
) -> Self;
pub(crate) async fn check_generation(&mut self) -> bool;
}
Import
// Internal module, not a public API
use crate::health::Health;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| client | ShardedClient | Yes | gRPC client connected to inference shards |
| inference_health | Arc[AtomicBool] | Yes | Shared health flag from inference loop |
| shard_info | ShardInfo | Yes | Shard metadata (dtype, model info) |
Outputs
| Name | Type | Description |
|---|---|---|
| healthy | bool | true if generation succeeds on all shards |
Usage Examples
Router Health Endpoint
// In router/src/server.rs - health endpoint handler
async fn health(mut health: Extension<Health>) -> impl IntoResponse {
match health.check_generation().await {
true => StatusCode::OK,
false => StatusCode::SERVICE_UNAVAILABLE,
}
}
Kubernetes Probe Configuration
livenessProbe:
httpGet:
path: /health
port: http
initialDelaySeconds: 120
periodSeconds: 10
failureThreshold: 24