Workflow:Protectai Llm guard API Server Deployment

Knowledge Sources	LLM Guard LLM Guard Docs
Domains	LLM_Security, API_Deployment, DevOps
Last Updated	2026-02-14 12:00 GMT

Overview

End-to-end process for deploying LLM Guard as a production REST API service using FastAPI, Docker, and configurable YAML-based scanner pipelines.

Description

This workflow covers deploying the LLM Guard API server, a FastAPI application that exposes HTTP endpoints for scanning LLM prompts and outputs. The server is configured through a YAML file that defines which scanners to load, their parameters, authentication settings, rate limiting, and observability (OpenTelemetry tracing and Prometheus metrics). The API supports both sequential scanning (analyze endpoints that return sanitized text) and parallel scanning (scan endpoints that run scanners concurrently for lower latency). Deployment is handled through Docker Compose with a Uvicorn ASGI server.

Usage

Execute this workflow when you need to provide LLM Guard scanning as a shared service that multiple applications can call over HTTP. This is the recommended approach for production deployments where scanning should be centralized, independently scalable, and accessible to applications written in any language.

Execution Steps

Step 1: Write the scanner configuration YAML

Create a YAML configuration file that defines the complete scanner pipeline. The file specifies application settings (logging, timeouts, fail-fast mode), authentication (HTTP Bearer or Basic), rate limiting, observability exporters (OpenTelemetry, Prometheus), and ordered lists of input and output scanners with their parameters.

Key considerations:

Scanners are applied in the order they appear in the YAML file
Environment variables can be used in the YAML with ${VAR_NAME:default} syntax
Each scanner entry has a type field (scanner class name) and optional params dictionary
Set lazy_load to true for faster startup (scanners loaded on first request instead of at boot)
Configure scan_fail_fast to stop the pipeline at the first scanner failure

Step 2: Configure authentication and rate limiting

Set up API authentication to protect the scanning endpoints. The server supports HTTP Bearer token authentication and HTTP Basic authentication. Configure rate limiting to prevent abuse of the scanning endpoints.

Key considerations:

Bearer token auth checks requests against a configured token value
Basic auth validates username and password pairs
Rate limiting uses slowapi and is configured as requests per time period (e.g., "100/minute")
Auth credentials should be injected via environment variables, not hardcoded in YAML

Step 3: Configure observability

Set up tracing and metrics exporters for production monitoring. The server integrates with OpenTelemetry for distributed tracing and supports Prometheus for metrics collection. Metrics include per-scanner validity counts and request latencies.

Key considerations:

Tracing supports otel_http (OpenTelemetry HTTP exporter) and console modes
Metrics supports otel_http, prometheus, and console exporters
The /metrics endpoint is only available when the Prometheus exporter is configured
OpenTelemetry instruments FastAPI routes, HTTP clients, and system resources automatically

Step 4: Build and deploy with Docker

Package the API server using Docker and deploy it with Docker Compose. The entrypoint script starts a Uvicorn ASGI server with configurable worker count and binding address. The configuration YAML is mounted into the container.

Key considerations:

The Docker Compose configuration maps the config directory and exposes the API port
Uvicorn workers should be set based on available CPU cores and expected concurrency
The CONFIG_FILE environment variable points to the scanner YAML configuration path
Health check endpoints (/healthz and /readyz) are available for container orchestration

Step 5: Use the scanning API endpoints

Send HTTP requests to the API to scan prompts and outputs. The server provides four main endpoints: /analyze/prompt and /analyze/output for sequential scanning with sanitization, and /scan/prompt and /scan/output for parallel scanning that returns only validity and risk scores without sanitization.

Key considerations:

Analyze endpoints run scanners sequentially and return sanitized text plus risk scores
Scan endpoints run scanners in parallel for lower latency but do not sanitize the text
Requests can suppress specific scanners using the scanners_suppress field
Timeouts are configurable per endpoint type (prompt vs. output) in the YAML config
All endpoints require authentication if configured in the YAML

Step 6: Monitor and maintain

Monitor scanner performance through the configured observability stack. Track scanner validity rates, request latencies, and error rates to identify issues and tune scanner thresholds.

Key considerations:

The /metrics Prometheus endpoint exposes scanner validity counters by scanner name and source (input/output)
Tracing data includes per-scanner execution times for identifying performance bottlenecks
Log output format is configurable: JSON for production log aggregation, plain text for development
OpenAPI docs are available at /openapi.json only when debug mode is enabled

Execution Diagram

GitHub URL

Workflow Repository