Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Promptfoo Promptfoo CI CD Integration

From Leeroopedia
Knowledge Sources
Domains LLM_Ops, CI_CD, DevOps, Quality_Assurance
Last Updated 2026-02-14 08:00 GMT

Overview

End-to-end process for embedding LLM evaluation and red team security scans into continuous integration pipelines to enforce quality gates and catch regressions automatically.

Description

This workflow integrates Promptfoo evaluations into CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, Azure Pipelines, etc.) to provide automated quality and security gates for LLM applications. It covers two primary patterns: quality evaluation (running assertions against prompt/model changes on pull requests) and security scanning (scheduled or triggered red team scans for vulnerability detection). The workflow uses exit codes and JSON output for programmatic quality gate enforcement.

Usage

Execute this workflow when you need to:

  • Automatically test LLM prompt changes on every pull request
  • Enforce pass rate thresholds before merging code
  • Run scheduled security scans against production LLM endpoints
  • Generate compliance reports for audit pipelines
  • Track evaluation metrics over time in a CI/CD dashboard

Input state: A repository with a promptfooconfig.yaml and a CI/CD pipeline definition (GitHub Actions, GitLab CI, Jenkins, etc.).

Output state: Automated pipeline runs producing evaluation results, pass/fail status, and optionally blocking merges when quality gates are not met.

Execution Steps

Step 1: Pipeline Configuration

Define the CI/CD pipeline trigger conditions and environment setup. For quality evaluations, configure triggers on pull requests when prompt files or configuration changes are detected. For security scans, configure scheduled triggers (e.g., daily or weekly) or manual dispatch. Set up Node.js 20+ runtime, install Promptfoo via npx, and configure API key secrets.

Key considerations:

  • Use npx promptfoo@latest to avoid global installation requirements
  • API keys must be stored as CI/CD secrets (never committed to the repository)
  • Cache the ~/.promptfoo directory to speed up subsequent runs
  • Docker-based runners need Node.js 20+ available in the container image

Step 2: Evaluation Execution

Run the Promptfoo evaluation or red team scan within the pipeline. For quality evaluations, execute promptfoo eval with the configuration file and output format. For security scans, execute promptfoo redteam run. Both commands produce structured JSON output that can be parsed for quality gate enforcement.

Key considerations:

  • Use --no-cache to ensure fresh API calls in CI environments
  • Use -o results.json to capture structured output for downstream processing
  • The --fail-on-error flag causes non-zero exit on evaluation errors
  • Set --max-concurrency to control API usage and costs in CI

Step 3: Quality Gate Enforcement

Parse evaluation results and enforce pass/fail thresholds. Extract the overall pass rate from the JSON output and compare it against the configured threshold. If the pass rate falls below the threshold, fail the pipeline step with a non-zero exit code to block the merge or deployment.

Key considerations:

  • Pass rate thresholds are configurable per pipeline (e.g., 90% for staging, 95% for production)
  • JSON output includes stats.successes and stats.failures fields for parsing
  • Red team scans can use vulnerability severity thresholds instead of pass rates
  • Custom quality gates can check specific assertion categories or named metrics

Step 4: Artifact Publishing

Upload evaluation results as pipeline artifacts for review and historical tracking. HTML reports provide human-readable summaries, while JSON output enables programmatic analysis. Results can also be pushed to the Promptfoo web UI for team-wide visibility.

Key considerations:

  • HTML reports are self-contained and can be viewed without a running server
  • JSON artifacts integrate with monitoring dashboards and alerting systems
  • The Promptfoo share feature generates URLs for team review
  • Multiple output formats can be generated simultaneously (JSON + HTML)

Step 5: Notification and Reporting

Configure notifications for pipeline results and generate reports for stakeholders. CI/CD integrations can post evaluation summaries as pull request comments, send Slack notifications on failures, or trigger webhooks for external systems. Security scan reports can be exported in formats compatible with vulnerability management tools.

Key considerations:

  • GitHub Actions can use gh pr comment to post results on pull requests
  • Scheduled security scans should notify the security team on new vulnerabilities
  • Compliance reports can be generated in formats required by audit processes
  • Cost tracking data in results helps monitor API spending over time

Execution Diagram

GitHub URL

Workflow Repository