Environment:Protectai Llm guard ONNX Runtime Acceleration
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Infrastructure |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Optional ONNX Runtime environment with HuggingFace Optimum for accelerated inference of LLM Guard scanner models on CPU or CUDA GPU.
Description
This environment extends the base Python runtime with ONNX Runtime support via the HuggingFace Optimum library. It provides hardware-accelerated inference for transformer-based scanner models. The system auto-detects whether to use CPUExecutionProvider or CUDAExecutionProvider based on the available hardware. Most LLM Guard scanners ship with pre-exported ONNX model variants alongside their PyTorch counterparts.
Usage
Use this environment when performance optimization is required. Enable ONNX by passing use_onnx=True when initializing any scanner that supports it. The API server (llm_guard_api) automatically enables ONNX for all supported scanners. This is especially beneficial for CPU-only deployments where ONNX Runtime provides significant speedups.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux, macOS, Windows | Same as base Python environment |
| Python | 3.10-3.12 | Same as base |
| Hardware (CPU) | Any modern x86_64 CPU | Uses CPUExecutionProvider
|
| Hardware (GPU) | NVIDIA GPU with CUDA | Uses CUDAExecutionProvider; requires CUDA toolkit
|
Dependencies
Python Packages (CPU)
optimum[onnxruntime]== 1.25.2
Python Packages (GPU)
optimum[onnxruntime-gpu]== 1.25.2
Credentials
No additional credentials beyond the base environment.
Quick Install
# For CPU inference
pip install "llm-guard[onnxruntime]"
# For GPU inference (requires CUDA)
pip install "llm-guard[onnxruntime-gpu]"
Code Evidence
ONNX support detection from llm_guard/transformers_helpers.py:29-40:
@lru_cache(maxsize=None) # Unbounded cache
def is_onnx_supported() -> bool:
is_supported = importlib.util.find_spec("optimum.onnxruntime") is not None
if not is_supported:
LOGGER.warning(
"ONNX Runtime is not available. "
"Please install optimum: "
"`pip install llm-guard[onnxruntime]` for CPU or "
"`pip install llm-guard[onnxruntime-gpu]` for GPU to enable ONNX Runtime optimizations."
)
return is_supported
CUDA vs CPU provider selection from llm_guard/transformers_helpers.py:43-65:
def _ort_model_for_sequence_classification(model: Model):
provider = "CPUExecutionProvider"
package_name = "optimum[onnxruntime]"
if device().type == "cuda":
package_name = "optimum[onnxruntime-gpu]"
provider = "CUDAExecutionProvider"
onnxruntime = lazy_load_dep("optimum.onnxruntime", package_name)
tf_model = onnxruntime.ORTModelForSequenceClassification.from_pretrained(
model.onnx_path or model.path,
export=model.onnx_path is None,
...
provider=provider,
)
Graceful fallback from llm_guard/transformers_helpers.py:83-84:
if use_onnx and is_onnx_supported() is False:
LOGGER.warning("ONNX is not supported on this machine. Using PyTorch instead of ONNX.")
use_onnx = False
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
ONNX Runtime is not available |
optimum package not installed | pip install "llm-guard[onnxruntime]"
|
ONNX is not supported on this machine. Using PyTorch instead |
ONNX not installed; auto-fallback to PyTorch | Install optimum or accept PyTorch fallback |
| CUDA provider errors | GPU driver mismatch or missing CUDA toolkit | Install matching CUDA toolkit or use CPU provider |
Compatibility Notes
- CPU-only systems: Use
optimum[onnxruntime]. Full functionality with CPU execution provider. - NVIDIA GPUs: Use
optimum[onnxruntime-gpu]. Requires compatible CUDA drivers. - Graceful degradation: If ONNX is not available, all scanners automatically fall back to PyTorch inference. No code changes needed.
- API Server: The API server (
llm_guard_api) forcesuse_onnx=Truefor all supported scanners.
Related Pages
- Implementation:Protectai_Llm_guard_PromptInjection
- Implementation:Protectai_Llm_guard_Toxicity
- Implementation:Protectai_Llm_guard_Anonymize
- Implementation:Protectai_Llm_guard_NoRefusal
- Implementation:Protectai_Llm_guard_Relevance
- Implementation:Protectai_Llm_guard_Sensitive
- Implementation:Protectai_Llm_guard_Benchmark_run