Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:Norrrrrrr lyn WAInjectBench Embedding Classifier Training

From Leeroopedia
Knowledge Sources
Domains Prompt_Injection, Security, Machine_Learning, Training
Last Updated 2026-02-14 16:00 GMT

Overview

End-to-end process for training lightweight embedding-based binary classifiers to detect prompt injections in both text and image modalities.

Description

This workflow trains LogisticRegression classifiers on top of pretrained embedding models for prompt injection detection. Two modalities are supported using the same pattern: text embeddings are generated using Sentence Transformers (all-MiniLM-L6-v2), while image embeddings are generated using CLIP (ViT-B-32 via OpenCLIP). Both modalities encode training samples into dense vector representations, then fit a LogisticRegression classifier to discriminate between benign (label 0) and malicious (label 1) samples. The trained classifiers are serialized with joblib for use by the corresponding detection modules.

Usage

Execute this workflow when you need to train (or retrain) the embedding-based detectors used in the WAInjectBench evaluation pipelines. You have labeled JSONL training data with binary labels (0 = benign, 1 = malicious) and want to produce a lightweight sklearn classifier that can be loaded at inference time by the embedding-t or embedding-i detector modules.

Execution Steps

Step 1: Training Data Preparation

Prepare labeled training data in JSONL format. For text, each record must contain a text field and a label field (1 for malicious, 0 for benign), with an optional source field. For images, each record must contain a path field pointing to an image file and a label field. Place all JSONL files in a single input directory.

Key considerations:

  • Text JSONL format: {"text": "...", "label": 1}
  • Image JSONL format: {"path": "path/to/image.png", "label": 1}
  • Multiple JSONL files in the input directory are each trained independently, producing one classifier per file

Step 2: Embedding Model Initialization

Load the pretrained embedding model appropriate for the modality. For text, load the all-MiniLM-L6-v2 model via the SentenceTransformer library. For images, load the ViT-B-32 model with laion2b_s34b_b79k pretrained weights via OpenCLIP, and move it to the target device (GPU or CPU).

What happens:

  • Text: SentenceTransformer downloads and caches the model from HuggingFace Hub
  • Image: OpenCLIP loads the CLIP vision encoder and its associated image preprocessing pipeline

Step 3: Feature Extraction

Encode all training samples into dense embedding vectors using the loaded model. For text, the SentenceTransformer encode() method processes texts in batches of 32. For images, each image is opened with PIL, preprocessed with the CLIP transform, and encoded through the CLIP vision tower. Image embeddings are L2-normalized after extraction.

Key considerations:

  • Text embeddings produce 384-dimensional vectors (MiniLM-L6-v2)
  • Image embeddings produce 512-dimensional vectors (ViT-B-32)
  • Image encoding handles failures gracefully by substituting zero vectors for corrupted images

Step 4: Classifier Training

Fit a scikit-learn LogisticRegression classifier on the extracted embeddings. The text classifier uses default parameters with max_iter=1000. The image classifier uses balanced class weights and max_iter=2000 to handle potential class imbalance. A classification report is printed to stdout for immediate quality assessment.

What happens:

  • The classifier learns a linear decision boundary in embedding space
  • Training is fast (seconds) since embeddings are precomputed
  • Class-weighted training for images accounts for uneven benign/malicious ratios

Step 5: Model Serialization

Save the trained LogisticRegression model to disk using joblib.dump(). The output filename is derived from the input JSONL filename with a _logreg.pkl suffix. For text training, embeddings can optionally be saved alongside the model as a JSONL file containing the embedding vectors, labels, and source metadata.

Output artifacts:

  • {input_stem}_logreg.pkl — serialized sklearn classifier
  • {input_stem}_embeddings.jsonl — (text only, optional) embeddings with metadata

Execution Diagram

GitHub URL

Workflow Repository