Environment:Protectai Llm guard ONNX Runtime Acceleration

Knowledge Sources	LLM Guard ONNX Runtime Optimization Tutorial
Domains	Optimization, Infrastructure
Last Updated	2026-02-14 12:00 GMT

Overview

Optional ONNX Runtime environment with HuggingFace Optimum for accelerated inference of LLM Guard scanner models on CPU or CUDA GPU.

Description

This environment extends the base Python runtime with ONNX Runtime support via the HuggingFace Optimum library. It provides hardware-accelerated inference for transformer-based scanner models. The system auto-detects whether to use CPUExecutionProvider or CUDAExecutionProvider based on the available hardware. Most LLM Guard scanners ship with pre-exported ONNX model variants alongside their PyTorch counterparts.

Usage

Use this environment when performance optimization is required. Enable ONNX by passing use_onnx=True when initializing any scanner that supports it. The API server (llm_guard_api) automatically enables ONNX for all supported scanners. This is especially beneficial for CPU-only deployments where ONNX Runtime provides significant speedups.

System Requirements

Category	Requirement	Notes
OS	Linux, macOS, Windows	Same as base Python environment
Python	3.10-3.12	Same as base
Hardware (CPU)	Any modern x86_64 CPU	Uses `CPUExecutionProvider`
Hardware (GPU)	NVIDIA GPU with CUDA	Uses `CUDAExecutionProvider`; requires CUDA toolkit

Dependencies

Python Packages (CPU)

optimum[onnxruntime] == 1.25.2

Python Packages (GPU)

optimum[onnxruntime-gpu] == 1.25.2

Credentials

No additional credentials beyond the base environment.

Quick Install

# For CPU inference
pip install "llm-guard[onnxruntime]"

# For GPU inference (requires CUDA)
pip install "llm-guard[onnxruntime-gpu]"

Code Evidence

ONNX support detection from llm_guard/transformers_helpers.py:29-40:

@lru_cache(maxsize=None)  # Unbounded cache
def is_onnx_supported() -> bool:
    is_supported = importlib.util.find_spec("optimum.onnxruntime") is not None
    if not is_supported:
        LOGGER.warning(
            "ONNX Runtime is not available. "
            "Please install optimum: "
            "`pip install llm-guard[onnxruntime]` for CPU or "
            "`pip install llm-guard[onnxruntime-gpu]` for GPU to enable ONNX Runtime optimizations."
        )
    return is_supported

CUDA vs CPU provider selection from llm_guard/transformers_helpers.py:43-65:

def _ort_model_for_sequence_classification(model: Model):
    provider = "CPUExecutionProvider"
    package_name = "optimum[onnxruntime]"
    if device().type == "cuda":
        package_name = "optimum[onnxruntime-gpu]"
        provider = "CUDAExecutionProvider"
    onnxruntime = lazy_load_dep("optimum.onnxruntime", package_name)
    tf_model = onnxruntime.ORTModelForSequenceClassification.from_pretrained(
        model.onnx_path or model.path,
        export=model.onnx_path is None,
        ...
        provider=provider,
    )

Graceful fallback from llm_guard/transformers_helpers.py:83-84:

if use_onnx and is_onnx_supported() is False:
    LOGGER.warning("ONNX is not supported on this machine. Using PyTorch instead of ONNX.")
    use_onnx = False

Common Errors

Error Message	Cause	Solution
`ONNX Runtime is not available`	optimum package not installed	`pip install "llm-guard[onnxruntime]"`
`ONNX is not supported on this machine. Using PyTorch instead`	ONNX not installed; auto-fallback to PyTorch	Install optimum or accept PyTorch fallback
CUDA provider errors	GPU driver mismatch or missing CUDA toolkit	Install matching CUDA toolkit or use CPU provider

Compatibility Notes

CPU-only systems: Use optimum[onnxruntime]. Full functionality with CPU execution provider.
NVIDIA GPUs: Use optimum[onnxruntime-gpu]. Requires compatible CUDA drivers.
Graceful degradation: If ONNX is not available, all scanners automatically fall back to PyTorch inference. No code changes needed.
API Server: The API server (llm_guard_api) forces use_onnx=True for all supported scanners.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment