Heuristic:Protectai Llm guard Lazy Scanner Loading

Knowledge Sources	LLM Guard API
Domains	Optimization, Infrastructure
Last Updated	2026-02-14 12:00 GMT

Overview

Startup optimization technique that defers scanner model loading until the first API request, reducing initial boot time at the cost of slower first-request latency.

Description

The LLM Guard API server supports a lazy_load configuration option. When enabled, scanner models are not loaded into memory during application startup. Instead, models are loaded on-demand when the first request arrives that requires them. After initial loading, scanners are cached in memory for subsequent requests via a closure-based caching pattern.

Usage

Use this heuristic when fast startup is more important than first-request latency. This is useful in auto-scaling environments (e.g., Kubernetes, AWS ECS) where new instances need to pass health checks quickly, or during development when rapid iteration on configuration is needed.

The Insight (Rule of Thumb)

Action: Set lazy_load: true in the app section of scanners.yml.
Value: Boolean flag. Default is false (eager loading).
Trade-off: Startup is fast, but the first API request incurs model loading overhead (potentially several seconds per scanner model). Health check endpoints (/healthz, /readyz) will pass immediately, but the first scan request will be slow.

Reasoning

Loading multiple transformer models at startup can take significant time and memory. The API server may have 10+ scanner models, each requiring HuggingFace model downloads and initialization. In orchestrated environments, load balancers may time out waiting for readiness. Lazy loading allows the server to become "ready" immediately and amortizes model loading across the first few requests.

# From llm_guard_api/app/app.py:126-141
def _get_input_scanners_function(config: Config, vault: Vault) -> Callable:
    scanners = []
    if not config.app.lazy_load:
        LOGGER.debug("Loading input scanners")
        scanners = get_input_scanners(config.input_scanners, vault)

    def get_cached_scanners() -> List[InputScanner]:
        nonlocal scanners
        if not scanners and config.app.lazy_load:
            LOGGER.debug("Lazy loading input scanners")
            scanners = get_input_scanners(config.input_scanners, vault)
        return scanners

    return get_cached_scanners

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment