Heuristic:Protectai Llm guard Fail Fast Early Exit
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Security |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Latency optimization using the fail_fast flag to abort the scanner pipeline as soon as the first scanner marks the input as invalid.
Description
The fail_fast mode is a configuration option for both the core library (scan_prompt/scan_output functions) and the API server. When enabled, the scanning pipeline stops execution immediately after the first scanner returns an invalid result (is_valid=False). This prevents unnecessary computation from running subsequent scanners when the input has already been determined to be unsafe.
Usage
Use this heuristic when response latency is critical and early rejection of invalid inputs is acceptable. This is especially valuable when the scanner pipeline contains expensive ML-based scanners later in the chain. Enable via fail_fast=True parameter or scan_fail_fast: true in API config.
The Insight (Rule of Thumb)
- Action: Enable
fail_fast=Trueinscan_prompt()/scan_output()calls, or setscan_fail_fast: truein the API server YAML config. - Value: Boolean flag; no tuning required.
- Trade-off: Only the first failing scanner's result is returned. Subsequent scanners are skipped, so you lose visibility into other potential issues. This means incomplete risk scores in the results dictionary.
- Combination: Order your scanners with cheap/fast scanners first (e.g., TokenLimit, BanSubstrings, Regex) before expensive ones (PromptInjection, Toxicity) to maximize the benefit.
Reasoning
LLM Guard runs scanners sequentially in the order they are configured. Without fail_fast, every scanner runs regardless of earlier results. For a pipeline with 5+ scanners where the first scanner already detects an issue, the remaining scanners represent wasted computation. In the API server's parallel scan endpoints (/scan/prompt, /scan/output), fail_fast causes asyncio.gather to propagate the first exception immediately.
# From llm_guard/evaluate.py:63-64
# Sequential fail_fast in scan_prompt
if fail_fast and not is_valid:
break
# From llm_guard_api/app/app.py:300
# Parallel fail_fast via asyncio.gather exception propagation
results = await asyncio.wait_for(
asyncio.gather(*tasks, return_exceptions=not config.app.scan_fail_fast),
config.app.scan_output_timeout,
)