Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Norrrrrrr lyn WAInjectBench JSONL Text Dataset Format

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, NLP
Last Updated 2026-02-14 16:00 GMT

Overview

A line-delimited JSON data format that organizes text samples with metadata for streaming-compatible prompt injection detection benchmarks.

Description

JSONL (JSON Lines) is a text format where each line is a valid JSON object. For text-based prompt injection detection, each line represents a single text sample with an identifier and the text content. This format enables line-by-line streaming, simple appending, and easy integration with Unix tools. The WAInjectBench benchmark organizes these files into benign/ and malicious/ subdirectories, where the directory structure itself encodes the ground-truth label.

Usage

Use this format whenever preparing or consuming text data for the text prompt injection detection pipeline. Each JSONL file in the data/text/benign/ or data/text/malicious/ directory represents one dataset scenario.

Theoretical Basis

The JSONL schema for text detection is:

# Each line in a .jsonl file:
{"id": int, "text": str}

Directory layout:

data/text/
├── benign/
│   ├── scenario_a.jsonl    # Each line: {"id": 1, "text": "..."}
│   └── scenario_b.jsonl
└── malicious/
    ├── attack_x.jsonl
    └── attack_y.jsonl

The folder name (benign vs malicious) determines the ground-truth label for metric computation (FPR for benign, TPR for malicious). Files are discovered via folder_path.glob("*.jsonl").

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment