Workflow:Promptfoo Promptfoo CI CD Integration

Knowledge Sources	Promptfoo CI/CD Integration GitHub Action Command Line Reference
Domains	LLM_Ops, CI_CD, DevOps, Quality_Assurance
Last Updated	2026-02-14 08:00 GMT

Overview

End-to-end process for embedding LLM evaluation and red team security scans into continuous integration pipelines to enforce quality gates and catch regressions automatically.

Description

This workflow integrates Promptfoo evaluations into CI/CD pipelines (GitHub Actions, GitLab CI, Jenkins, Azure Pipelines, etc.) to provide automated quality and security gates for LLM applications. It covers two primary patterns: quality evaluation (running assertions against prompt/model changes on pull requests) and security scanning (scheduled or triggered red team scans for vulnerability detection). The workflow uses exit codes and JSON output for programmatic quality gate enforcement.

Usage

Execute this workflow when you need to:

Automatically test LLM prompt changes on every pull request
Enforce pass rate thresholds before merging code
Run scheduled security scans against production LLM endpoints
Generate compliance reports for audit pipelines
Track evaluation metrics over time in a CI/CD dashboard

Input state: A repository with a promptfooconfig.yaml and a CI/CD pipeline definition (GitHub Actions, GitLab CI, Jenkins, etc.).

Output state: Automated pipeline runs producing evaluation results, pass/fail status, and optionally blocking merges when quality gates are not met.

Execution Steps

Step 1: Pipeline Configuration

Define the CI/CD pipeline trigger conditions and environment setup. For quality evaluations, configure triggers on pull requests when prompt files or configuration changes are detected. For security scans, configure scheduled triggers (e.g., daily or weekly) or manual dispatch. Set up Node.js 20+ runtime, install Promptfoo via npx, and configure API key secrets.

Key considerations:

Use npx promptfoo@latest to avoid global installation requirements
API keys must be stored as CI/CD secrets (never committed to the repository)
Cache the ~/.promptfoo directory to speed up subsequent runs
Docker-based runners need Node.js 20+ available in the container image

Step 2: Evaluation Execution

Run the Promptfoo evaluation or red team scan within the pipeline. For quality evaluations, execute promptfoo eval with the configuration file and output format. For security scans, execute promptfoo redteam run. Both commands produce structured JSON output that can be parsed for quality gate enforcement.

Key considerations:

Use --no-cache to ensure fresh API calls in CI environments
Use -o results.json to capture structured output for downstream processing
The --fail-on-error flag causes non-zero exit on evaluation errors
Set --max-concurrency to control API usage and costs in CI

Step 3: Quality Gate Enforcement

Parse evaluation results and enforce pass/fail thresholds. Extract the overall pass rate from the JSON output and compare it against the configured threshold. If the pass rate falls below the threshold, fail the pipeline step with a non-zero exit code to block the merge or deployment.

Key considerations:

Pass rate thresholds are configurable per pipeline (e.g., 90% for staging, 95% for production)
JSON output includes stats.successes and stats.failures fields for parsing
Red team scans can use vulnerability severity thresholds instead of pass rates
Custom quality gates can check specific assertion categories or named metrics

Step 4: Artifact Publishing

Upload evaluation results as pipeline artifacts for review and historical tracking. HTML reports provide human-readable summaries, while JSON output enables programmatic analysis. Results can also be pushed to the Promptfoo web UI for team-wide visibility.

Key considerations:

HTML reports are self-contained and can be viewed without a running server
JSON artifacts integrate with monitoring dashboards and alerting systems
The Promptfoo share feature generates URLs for team review
Multiple output formats can be generated simultaneously (JSON + HTML)

Step 5: Notification and Reporting

Configure notifications for pipeline results and generate reports for stakeholders. CI/CD integrations can post evaluation summaries as pull request comments, send Slack notifications on failures, or trigger webhooks for external systems. Security scan reports can be exported in formats compatible with vulnerability management tools.

Key considerations:

GitHub Actions can use gh pr comment to post results on pull requests
Scheduled security scans should notify the security team on new vulnerabilities
Compliance reports can be generated in formats required by audit processes
Cost tracking data in results helps monitor API spending over time

Execution Diagram

GitHub URL

Workflow Repository