Principle:Norrrrrrr lyn WAInjectBench Image Dataset Format

Knowledge Sources	PIL Image Formats
Domains	Data_Engineering, Computer_Vision
Last Updated	2026-02-14 16:00 GMT

Overview

A folder-based data organization scheme that structures image files by scenario and label for image-based prompt injection detection benchmarks.

Description

Unlike text data which uses JSONL files, image data is organized as a hierarchy of folders. Each scenario (e.g., a specific attack type or benign context) is a subfolder containing numbered image files (e.g., 1.png, 2.jpg). The top-level split into benign/ and malicious/ directories encodes the ground-truth label. The total number of images per folder is counted via folder_path.glob("*"), and detected image IDs are extracted from filenames.

Usage

Use this format when preparing image datasets for the image prompt injection detection pipeline. The --data_dir argument (default "data/image") points to the root directory.

Theoretical Basis

Directory layout:

data/image/
├── benign/
│   ├── scenario_a/        # Contains: 1.png, 2.png, 3.jpg, ...
│   └── scenario_b/
└── malicious/
    ├── attack_x/          # Contains: 1.png, 2.png, ...
    └── attack_y/

Key conventions:

Image filenames are numeric (the number becomes the sample ID)
Any image format supported by PIL is accepted
The parent folder name (benign/malicious) determines metric type (FPR/TPR)
Subfolders are discovered via parent_path.iterdir()

Related Pages

Implemented By

Implementation:Norrrrrrr_lyn_WAInjectBench_Image_Folder_Data_Schema

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment