Environment:Protectai Llm guard API Server Deployment
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, API, Observability |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
FastAPI server environment with Uvicorn, OpenTelemetry instrumentation, rate limiting, and Docker support for deploying LLM Guard as a REST API service.
Description
This environment provides the full deployment stack for the LLM Guard API server. It builds on top of the core Python runtime and adds FastAPI for HTTP handling, Uvicorn as the ASGI server, OpenTelemetry for distributed tracing and metrics (supporting AWS X-Ray, OTLP HTTP, Prometheus, and console exporters), SlowAPI for rate limiting, and Pydantic for request/response validation. The server supports Docker containerization with configurable workers and scanner pipelines via YAML configuration.
Usage
Use this environment when deploying LLM Guard as a standalone API service that exposes REST endpoints for prompt and output scanning. This is the production deployment path, providing /analyze/prompt, /analyze/output, /scan/prompt, and /scan/output endpoints with authentication, rate limiting, and observability.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Linux (recommended), macOS | Docker deployment recommended |
| Python | 3.10-3.12 | requires-python = ">=3.10,<3.13"
|
| Port | 8000 (default) | Configurable via Uvicorn |
| RAM | 8GB+ recommended | Multiple scanner models loaded simultaneously |
| Disk | 10GB+ | For storing downloaded HuggingFace models |
Dependencies
Python Packages
llm-guard== 0.3.16 (core library)fastapi== 0.115.12uvicorn[standard]== 0.34.2pydantic== 2.11.4pyyaml== 6.0.2slowapi== 0.1.9asyncio== 3.4.3structlog>= 24psutil>= 5.9opentelemetry-api== 1.33.1opentelemetry-sdk== 1.33.1opentelemetry-instrumentation-fastapi== 0.54b1opentelemetry-exporter-otlp-proto-http== 1.33.1opentelemetry-exporter-prometheus== 0.54b1opentelemetry-sdk-extension-aws== 2.1.0opentelemetry-propagator-aws-xray== 1.0.2
Credentials
The following environment variables may be required depending on configuration:
CONFIG_FILE: Path to the YAML scanner configuration file (default:./config/scanners.yml).APP_WORKERS: Number of Uvicorn worker processes (default:1).- Auth credentials (configured in YAML):
http_bearer: API token for bearer authentication.http_basic: Username and password for basic authentication.
- YAML config supports
${ENV_VAR}syntax for injecting environment variables.
Quick Install
# Install API server package
pip install llm-guard-api
# For CPU optimized deployment
pip install "llm-guard-api[cpu]"
# For GPU optimized deployment
pip install "llm-guard-api[gpu]"
# Docker deployment
cd llm_guard_api
docker-compose up
Code Evidence
Config file loading from environment from llm_guard_api/app/app.py:57-62:
def create_app() -> FastAPI:
config_file = os.getenv("CONFIG_FILE", "./config/scanners.yml")
if not config_file:
raise ValueError("Config file is required")
config = get_config(config_file)
Thread limiting for scanner execution from llm_guard_api/app/scanner.py:31:
torch.set_num_threads(1)
Docker entrypoint from llm_guard_api/entrypoint.sh:3-7:
APP_WORKERS=${APP_WORKERS:-1}
CONFIG_FILE=${CONFIG_FILE:-./config/scanners.yml}
uvicorn app.app:create_app --host=0.0.0.0 --port=8000 --workers="$APP_WORKERS" --forwarded-allow-ips="*" --proxy-headers --timeout-keep-alive="2"
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
ValueError: Config file is required |
CONFIG_FILE env var empty or file missing | Set CONFIG_FILE to valid YAML path
|
Error loading YAML file |
Invalid YAML syntax or missing file | Validate YAML config syntax |
408 Request Timeout |
Scanner pipeline exceeds timeout | Increase scan_prompt_timeout / scan_output_timeout in config
|
429 Rate Limit Exceeded |
Too many requests | Adjust rate_limit.limit in config
|
Compatibility Notes
- Docker: Default Docker Compose exposes port 8000 and mounts config volume.
- Workers:
APP_WORKERSdefaults to 1; increase for multi-process serving. Note:torch.set_num_threads(1)is set to prevent thread contention across workers. - Observability: Supports AWS X-Ray, OTLP HTTP, Prometheus, and console exporters. Configured via YAML.
- Health checks:
/healthzand/readyzendpoints available for load balancer configuration.