Implementation:Norrrrrrr lyn WAInjectBench load jsonl
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Machine_Learning |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for loading JSONL training files into structured lists of features and labels, provided by the WAInjectBench training modules.
Description
The load_jsonl function exists in two variants:
- Text variant (
train/embedding-t.py:L12-22): Parses{"text", "label", "source"}entries, returning(texts, labels, sources)tuples. - Image variant (
train/embedding-i.py:L16-25): Parses{"path", "label"}entries, returning(paths, labels)tuples.
Both variants skip empty lines and parse labels as integers.
Usage
Called at the beginning of classifier training to load labeled training data from JSONL files.
Code Reference
Source Location
- Repository: WAInjectBench
- File: train/embedding-t.py (L12-22), train/embedding-i.py (L16-25)
Signature
# Text variant (train/embedding-t.py:L12-22)
def load_jsonl(file_path):
texts, labels, sources = [], [], []
with open(file_path, "r", encoding="utf-8") as f:
for line in f:
if not line.strip():
continue
data = json.loads(line)
texts.append(data["text"])
labels.append(data["label"])
sources.append(data.get("source", "unknown"))
return texts, labels, sources
# Image variant (train/embedding-i.py:L16-25)
def load_jsonl(file_path):
paths, labels = [], []
with open(file_path, "r", encoding="utf-8") as f:
for line in f:
if not line.strip():
continue
data = json.loads(line)
paths.append(data["path"])
labels.append(int(data["label"]))
return paths, labels
Import
# Defined locally in each training script
import json
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| file_path | str | Yes | Path to a JSONL training file |
Outputs (Text Variant)
| Name | Type | Description |
|---|---|---|
| texts | List[str] | Text content of each sample |
| labels | List[int] | Binary labels (0=benign, 1=malicious) |
| sources | List[str] | Source identifiers (default "unknown") |
Outputs (Image Variant)
| Name | Type | Description |
|---|---|---|
| paths | List[str] | Image file paths |
| labels | List[int] | Binary labels (0=benign, 1=malicious) |
Usage Examples
Loading Text Training Data
# Text variant
texts, labels, sources = load_jsonl("train_data/text_dataset.jsonl")
print(f"Loaded {len(texts)} samples, {sum(labels)} malicious")
Loading Image Training Data
# Image variant
paths, labels = load_jsonl("train_data/image_dataset.jsonl")
print(f"Loaded {len(paths)} images, {sum(labels)} malicious")
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment