Principle:Norrrrrrr lyn WAInjectBench LLaVA Model Initialization
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, NLP, Deep_Learning |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A model initialization step that loads LLaVA (Large Language and Vision Assistant) and constrains its output to binary Yes/No token logits for prompt injection detection.
Description
LLaVA is a multimodal model that combines a vision encoder (CLIP ViT) with a language model (LLaMA) for vision-language understanding. For the binary detection task, the model's output is constrained to two tokens: "Yes" (injection detected) and "No" (benign). This is achieved by the LlavaYesnoToken wrapper, which:
- Loads the base LLaVA model with the specified dtype and safetensors format
- Enables gradient checkpointing for memory efficiency
- Initializes a processor and tokenizer
- Resolves the token IDs for "Yes" and "No" verbalizers
The wrapper's forward() method processes image batches with a security detection system prompt, extracts the logits at the last token position, and returns only the Yes/No logit pair.
Usage
Use this when initializing a LLaVA model for binary prompt injection detection, either for fine-tuning or inference.
Theoretical Basis
Binary classification via constrained generation:
# Instead of generating arbitrary text, extract logits for specific tokens
logits = model(images, prompt) # Full vocabulary logits at last position
binary_logits = logits[:, [ID_NO, ID_YES]] # Extract only Yes/No scores
prediction = binary_logits.argmax(-1) # 0=No, 1=Yes
This approach converts an open-ended generative model into a binary classifier by only examining the probability distribution over two specific tokens, avoiding the cost and variability of full text generation.