Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Protectai Llm guard API Server Deployment

From Leeroopedia
Knowledge Sources
Domains Infrastructure, API, Observability
Last Updated 2026-02-14 12:00 GMT

Overview

FastAPI server environment with Uvicorn, OpenTelemetry instrumentation, rate limiting, and Docker support for deploying LLM Guard as a REST API service.

Description

This environment provides the full deployment stack for the LLM Guard API server. It builds on top of the core Python runtime and adds FastAPI for HTTP handling, Uvicorn as the ASGI server, OpenTelemetry for distributed tracing and metrics (supporting AWS X-Ray, OTLP HTTP, Prometheus, and console exporters), SlowAPI for rate limiting, and Pydantic for request/response validation. The server supports Docker containerization with configurable workers and scanner pipelines via YAML configuration.

Usage

Use this environment when deploying LLM Guard as a standalone API service that exposes REST endpoints for prompt and output scanning. This is the production deployment path, providing /analyze/prompt, /analyze/output, /scan/prompt, and /scan/output endpoints with authentication, rate limiting, and observability.

System Requirements

Category Requirement Notes
OS Linux (recommended), macOS Docker deployment recommended
Python 3.10-3.12 requires-python = ">=3.10,<3.13"
Port 8000 (default) Configurable via Uvicorn
RAM 8GB+ recommended Multiple scanner models loaded simultaneously
Disk 10GB+ For storing downloaded HuggingFace models

Dependencies

Python Packages

  • llm-guard == 0.3.16 (core library)
  • fastapi == 0.115.12
  • uvicorn[standard] == 0.34.2
  • pydantic == 2.11.4
  • pyyaml == 6.0.2
  • slowapi == 0.1.9
  • asyncio == 3.4.3
  • structlog >= 24
  • psutil >= 5.9
  • opentelemetry-api == 1.33.1
  • opentelemetry-sdk == 1.33.1
  • opentelemetry-instrumentation-fastapi == 0.54b1
  • opentelemetry-exporter-otlp-proto-http == 1.33.1
  • opentelemetry-exporter-prometheus == 0.54b1
  • opentelemetry-sdk-extension-aws == 2.1.0
  • opentelemetry-propagator-aws-xray == 1.0.2

Credentials

The following environment variables may be required depending on configuration:

  • CONFIG_FILE: Path to the YAML scanner configuration file (default: ./config/scanners.yml).
  • APP_WORKERS: Number of Uvicorn worker processes (default: 1).
  • Auth credentials (configured in YAML):
    • http_bearer: API token for bearer authentication.
    • http_basic: Username and password for basic authentication.
  • YAML config supports ${ENV_VAR} syntax for injecting environment variables.

Quick Install

# Install API server package
pip install llm-guard-api

# For CPU optimized deployment
pip install "llm-guard-api[cpu]"

# For GPU optimized deployment
pip install "llm-guard-api[gpu]"

# Docker deployment
cd llm_guard_api
docker-compose up

Code Evidence

Config file loading from environment from llm_guard_api/app/app.py:57-62:

def create_app() -> FastAPI:
    config_file = os.getenv("CONFIG_FILE", "./config/scanners.yml")
    if not config_file:
        raise ValueError("Config file is required")
    config = get_config(config_file)

Thread limiting for scanner execution from llm_guard_api/app/scanner.py:31:

torch.set_num_threads(1)

Docker entrypoint from llm_guard_api/entrypoint.sh:3-7:

APP_WORKERS=${APP_WORKERS:-1}
CONFIG_FILE=${CONFIG_FILE:-./config/scanners.yml}
uvicorn app.app:create_app --host=0.0.0.0 --port=8000 --workers="$APP_WORKERS" --forwarded-allow-ips="*" --proxy-headers --timeout-keep-alive="2"

Common Errors

Error Message Cause Solution
ValueError: Config file is required CONFIG_FILE env var empty or file missing Set CONFIG_FILE to valid YAML path
Error loading YAML file Invalid YAML syntax or missing file Validate YAML config syntax
408 Request Timeout Scanner pipeline exceeds timeout Increase scan_prompt_timeout / scan_output_timeout in config
429 Rate Limit Exceeded Too many requests Adjust rate_limit.limit in config

Compatibility Notes

  • Docker: Default Docker Compose exposes port 8000 and mounts config volume.
  • Workers: APP_WORKERS defaults to 1; increase for multi-process serving. Note: torch.set_num_threads(1) is set to prevent thread contention across workers.
  • Observability: Supports AWS X-Ray, OTLP HTTP, Prometheus, and console exporters. Configured via YAML.
  • Health checks: /healthz and /readyz endpoints available for load balancer configuration.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment