Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Norrrrrrr lyn WAInjectBench Training Data Loading

From Leeroopedia
Knowledge Sources
Domains Data_Engineering, Machine_Learning
Last Updated 2026-02-14 16:00 GMT

Overview

A data loading pattern that parses JSONL training files into structured arrays of features and labels for embedding-based classifier training.

Description

Training Data Loading reads JSONL files that contain labeled samples for training binary classifiers. The format differs slightly between text and image modalities:

  • Text variant: Each line is {"text": str, "label": int, "source": str}. Returns parallel lists of texts, labels, and sources.
  • Image variant: Each line is {"path": str, "label": int}. Returns parallel lists of image paths and labels.

Labels are binary: 1 for malicious (prompt injection), 0 for benign. The load_jsonl function provides a clean interface for parsing these formats, skipping empty lines.

Usage

Use this pattern when preparing data for embedding-based classifier training. It is the first step in both the text embedding and image embedding training pipelines.

Theoretical Basis

# Data loading pattern for labeled training data
features, labels = [], []
for line in file:
    data = json.loads(line)
    features.append(data[feature_key])  # "text" or "path"
    labels.append(data["label"])        # 0 or 1

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment