Environment:Protectai Llm guard API Server Deployment

Knowledge Sources	LLM Guard API API Deployment
Domains	Infrastructure, API, Observability
Last Updated	2026-02-14 12:00 GMT

Overview

FastAPI server environment with Uvicorn, OpenTelemetry instrumentation, rate limiting, and Docker support for deploying LLM Guard as a REST API service.

Description

This environment provides the full deployment stack for the LLM Guard API server. It builds on top of the core Python runtime and adds FastAPI for HTTP handling, Uvicorn as the ASGI server, OpenTelemetry for distributed tracing and metrics (supporting AWS X-Ray, OTLP HTTP, Prometheus, and console exporters), SlowAPI for rate limiting, and Pydantic for request/response validation. The server supports Docker containerization with configurable workers and scanner pipelines via YAML configuration.

Usage

Use this environment when deploying LLM Guard as a standalone API service that exposes REST endpoints for prompt and output scanning. This is the production deployment path, providing /analyze/prompt, /analyze/output, /scan/prompt, and /scan/output endpoints with authentication, rate limiting, and observability.

System Requirements

Category	Requirement	Notes
OS	Linux (recommended), macOS	Docker deployment recommended
Python	3.10-3.12	`requires-python = ">=3.10,<3.13"`
Port	8000 (default)	Configurable via Uvicorn
RAM	8GB+ recommended	Multiple scanner models loaded simultaneously
Disk	10GB+	For storing downloaded HuggingFace models

Dependencies

Python Packages

llm-guard == 0.3.16 (core library)
fastapi == 0.115.12
uvicorn[standard] == 0.34.2
pydantic == 2.11.4
pyyaml == 6.0.2
slowapi == 0.1.9
asyncio == 3.4.3
structlog >= 24
psutil >= 5.9
opentelemetry-api == 1.33.1
opentelemetry-sdk == 1.33.1
opentelemetry-instrumentation-fastapi == 0.54b1
opentelemetry-exporter-otlp-proto-http == 1.33.1
opentelemetry-exporter-prometheus == 0.54b1
opentelemetry-sdk-extension-aws == 2.1.0
opentelemetry-propagator-aws-xray == 1.0.2

Credentials

The following environment variables may be required depending on configuration:

CONFIG_FILE: Path to the YAML scanner configuration file (default: ./config/scanners.yml).
APP_WORKERS: Number of Uvicorn worker processes (default: 1).
Auth credentials (configured in YAML):
- http_bearer: API token for bearer authentication.
- http_basic: Username and password for basic authentication.
YAML config supports ${ENV_VAR} syntax for injecting environment variables.

Quick Install

# Install API server package
pip install llm-guard-api

# For CPU optimized deployment
pip install "llm-guard-api[cpu]"

# For GPU optimized deployment
pip install "llm-guard-api[gpu]"

# Docker deployment
cd llm_guard_api
docker-compose up

Code Evidence

Config file loading from environment from llm_guard_api/app/app.py:57-62:

def create_app() -> FastAPI:
    config_file = os.getenv("CONFIG_FILE", "./config/scanners.yml")
    if not config_file:
        raise ValueError("Config file is required")
    config = get_config(config_file)

Thread limiting for scanner execution from llm_guard_api/app/scanner.py:31:

torch.set_num_threads(1)

Docker entrypoint from llm_guard_api/entrypoint.sh:3-7:

APP_WORKERS=${APP_WORKERS:-1}
CONFIG_FILE=${CONFIG_FILE:-./config/scanners.yml}
uvicorn app.app:create_app --host=0.0.0.0 --port=8000 --workers="$APP_WORKERS" --forwarded-allow-ips="*" --proxy-headers --timeout-keep-alive="2"

Common Errors

Error Message	Cause	Solution
`ValueError: Config file is required`	CONFIG_FILE env var empty or file missing	Set `CONFIG_FILE` to valid YAML path
`Error loading YAML file`	Invalid YAML syntax or missing file	Validate YAML config syntax
`408 Request Timeout`	Scanner pipeline exceeds timeout	Increase `scan_prompt_timeout` / `scan_output_timeout` in config
`429 Rate Limit Exceeded`	Too many requests	Adjust `rate_limit.limit` in config

Compatibility Notes

Docker: Default Docker Compose exposes port 8000 and mounts config volume.
Workers: APP_WORKERS defaults to 1; increase for multi-process serving. Note: torch.set_num_threads(1) is set to prevent thread contention across workers.
Observability: Supports AWS X-Ray, OTLP HTTP, Prometheus, and console exporters. Configured via YAML.
Health checks: /healthz and /readyz endpoints available for load balancer configuration.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment