Principle:Promptfoo Promptfoo Attack Generation

Knowledge Sources	Promptfoo Red Team Architecture
Domains	Security_Testing, Adversarial_ML
Last Updated	2026-02-14 08:00 GMT

Overview

An automated adversarial test case generation mechanism that creates targeted attacks by analyzing system prompts and composing plugin-specific payloads.

Description

Attack Generation is the core creative phase of red team testing. Given a target system's prompts, it first extracts the system's purpose and relevant entities (company names, product names) to create contextually realistic attacks. Then it runs each selected plugin to generate vulnerability-specific test cases.

The generation pipeline uses an LLM to craft attacks that are both effective and realistic, ensuring the target system is tested under conditions that mirror real-world adversarial use. This is distinct from static payload databases because the attacks are dynamically tailored to each target.

Usage

Use this principle when generating adversarial test cases from scratch. This is triggered by `promptfoo redteam generate` or as part of `promptfoo redteam run`.

Theoretical Basis

Pseudo-code Logic:

1. Extract system purpose from prompts (LLM-assisted)
2. Extract entities from prompts (company names, etc.)
3. For each plugin (with concurrency control):
   a. Generate n test cases using plugin's LLM template
   b. Deduplicate generated tests
   c. Track failed plugins for reporting
4. Apply strategies to transform base test cases
5. Return: { purpose, entities, testCases, injectVar, failedPlugins }

Related Pages

Implemented By

Implementation:Promptfoo_Promptfoo_synthesize

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment