Principle:Norrrrrrr lyn WAInjectBench Binary Classifier Training

Knowledge Sources	Scikit-learn LogisticRegression
Domains	Machine_Learning, Classification
Last Updated	2026-02-14 16:00 GMT

Overview

A supervised learning step that trains a logistic regression classifier on embedding features to distinguish between benign and malicious samples.

Description

Binary Classifier Training fits a logistic regression model to the embedding feature matrix with binary labels (0=benign, 1=malicious). Logistic regression is chosen for its simplicity, interpretability, and effectiveness on well-separated embedding spaces. The WAInjectBench project uses two configurations:

Text: LogisticRegression(max_iter=1000) — standard settings
Image: LogisticRegression(max_iter=2000, class_weight="balanced", n_jobs=-1) — more iterations, balanced class weights to handle potential class imbalance, and parallel fitting

After fitting, both variants print a classification_report on the training data for immediate quality inspection.

Usage

Use this after feature extraction to train a binary classifier. The fitted model is then serialized for use in the detection pipeline.

Theoretical Basis

Logistic regression models the probability of the positive class:

$P (y = 1 | x) = σ (w^{T} x + b) = \frac{1}{1 + e^{- (w^{T} x + b)}}$

Where $w$ is the weight vector, $b$ is the bias, and $σ$ is the sigmoid function. The model is trained by minimizing the regularized cross-entropy loss.

With class_weight="balanced", the loss for each class is inversely weighted by its frequency, preventing the classifier from being biased toward the majority class.

Related Pages

Implemented By

Implementation:Norrrrrrr_lyn_WAInjectBench_LogisticRegression_Fit

Uses Heuristic

Heuristic:Norrrrrrr_lyn_WAInjectBench_Balanced_Class_Weights_Imbalanced_Data

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment