Heuristic:Protectai Llm guard Lazy Scanner Loading
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Infrastructure |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Startup optimization technique that defers scanner model loading until the first API request, reducing initial boot time at the cost of slower first-request latency.
Description
The LLM Guard API server supports a lazy_load configuration option. When enabled, scanner models are not loaded into memory during application startup. Instead, models are loaded on-demand when the first request arrives that requires them. After initial loading, scanners are cached in memory for subsequent requests via a closure-based caching pattern.
Usage
Use this heuristic when fast startup is more important than first-request latency. This is useful in auto-scaling environments (e.g., Kubernetes, AWS ECS) where new instances need to pass health checks quickly, or during development when rapid iteration on configuration is needed.
The Insight (Rule of Thumb)
- Action: Set
lazy_load: truein theappsection ofscanners.yml. - Value: Boolean flag. Default is
false(eager loading). - Trade-off: Startup is fast, but the first API request incurs model loading overhead (potentially several seconds per scanner model). Health check endpoints (
/healthz,/readyz) will pass immediately, but the first scan request will be slow.
Reasoning
Loading multiple transformer models at startup can take significant time and memory. The API server may have 10+ scanner models, each requiring HuggingFace model downloads and initialization. In orchestrated environments, load balancers may time out waiting for readiness. Lazy loading allows the server to become "ready" immediately and amortizes model loading across the first few requests.
# From llm_guard_api/app/app.py:126-141
def _get_input_scanners_function(config: Config, vault: Vault) -> Callable:
scanners = []
if not config.app.lazy_load:
LOGGER.debug("Loading input scanners")
scanners = get_input_scanners(config.input_scanners, vault)
def get_cached_scanners() -> List[InputScanner]:
nonlocal scanners
if not scanners and config.app.lazy_load:
LOGGER.debug("Lazy loading input scanners")
scanners = get_input_scanners(config.input_scanners, vault)
return scanners
return get_cached_scanners