Principle:Norrrrrrr lyn WAInjectBench Model Serialization
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Model_Management |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A model persistence step that serializes trained scikit-learn classifiers to disk for later use in the detection pipeline.
Description
After training, the fitted LogisticRegression classifier must be saved to disk so the detection modules can load it at inference time. The WAInjectBench project uses joblib.dump for serialization, which is more efficient than Python's pickle for numpy-heavy objects. The output filename is derived from the training JSONL filename with a _logreg.pkl suffix.
Usage
Use this as the final step of the embedding classifier training pipeline. The saved model file is consumed by the corresponding detector modules (detector_text/embedding-t.py and detector_image/embedding-i.py) at inference time.
Theoretical Basis
# Model serialization pattern
save_path = os.path.join(output_dir, f"{dataset_stem}_logreg.pkl")
joblib.dump(fitted_classifier, save_path)
# Later: clf = joblib.load(save_path)
Joblib uses numpy-aware compression that is significantly faster and more compact than standard pickle for objects containing large numpy arrays (such as sklearn model weights).