Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Promptfoo Promptfoo Attack Generation

From Leeroopedia
Knowledge Sources
Domains Security_Testing, Adversarial_ML
Last Updated 2026-02-14 08:00 GMT

Overview

An automated adversarial test case generation mechanism that creates targeted attacks by analyzing system prompts and composing plugin-specific payloads.

Description

Attack Generation is the core creative phase of red team testing. Given a target system's prompts, it first extracts the system's purpose and relevant entities (company names, product names) to create contextually realistic attacks. Then it runs each selected plugin to generate vulnerability-specific test cases.

The generation pipeline uses an LLM to craft attacks that are both effective and realistic, ensuring the target system is tested under conditions that mirror real-world adversarial use. This is distinct from static payload databases because the attacks are dynamically tailored to each target.

Usage

Use this principle when generating adversarial test cases from scratch. This is triggered by `promptfoo redteam generate` or as part of `promptfoo redteam run`.

Theoretical Basis

Pseudo-code Logic:

1. Extract system purpose from prompts (LLM-assisted)
2. Extract entities from prompts (company names, etc.)
3. For each plugin (with concurrency control):
   a. Generate n test cases using plugin's LLM template
   b. Deduplicate generated tests
   c. Track failed plugins for reporting
4. Apply strategies to transform base test cases
5. Return: { purpose, entities, testCases, injectVar, failedPlugins }

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment