Principle:Promptfoo Promptfoo Attack Generation
| Knowledge Sources | |
|---|---|
| Domains | Security_Testing, Adversarial_ML |
| Last Updated | 2026-02-14 08:00 GMT |
Overview
An automated adversarial test case generation mechanism that creates targeted attacks by analyzing system prompts and composing plugin-specific payloads.
Description
Attack Generation is the core creative phase of red team testing. Given a target system's prompts, it first extracts the system's purpose and relevant entities (company names, product names) to create contextually realistic attacks. Then it runs each selected plugin to generate vulnerability-specific test cases.
The generation pipeline uses an LLM to craft attacks that are both effective and realistic, ensuring the target system is tested under conditions that mirror real-world adversarial use. This is distinct from static payload databases because the attacks are dynamically tailored to each target.
Usage
Use this principle when generating adversarial test cases from scratch. This is triggered by `promptfoo redteam generate` or as part of `promptfoo redteam run`.
Theoretical Basis
Pseudo-code Logic:
1. Extract system purpose from prompts (LLM-assisted)
2. Extract entities from prompts (company names, etc.)
3. For each plugin (with concurrency control):
a. Generate n test cases using plugin's LLM template
b. Deduplicate generated tests
c. Track failed plugins for reporting
4. Apply strategies to transform base test cases
5. Return: { purpose, entities, testCases, injectVar, failedPlugins }