Principle:Norrrrrrr lyn WAInjectBench LLaVA Model Initialization

Knowledge Sources	LLaVA HuggingFace LLaVA
Domains	Computer_Vision, NLP, Deep_Learning
Last Updated	2026-02-14 16:00 GMT

Overview

A model initialization step that loads LLaVA (Large Language and Vision Assistant) and constrains its output to binary Yes/No token logits for prompt injection detection.

Description

LLaVA is a multimodal model that combines a vision encoder (CLIP ViT) with a language model (LLaMA) for vision-language understanding. For the binary detection task, the model's output is constrained to two tokens: "Yes" (injection detected) and "No" (benign). This is achieved by the LlavaYesnoToken wrapper, which:

Loads the base LLaVA model with the specified dtype and safetensors format
Enables gradient checkpointing for memory efficiency
Initializes a processor and tokenizer
Resolves the token IDs for "Yes" and "No" verbalizers

The wrapper's forward() method processes image batches with a security detection system prompt, extracts the logits at the last token position, and returns only the Yes/No logit pair.

Usage

Use this when initializing a LLaVA model for binary prompt injection detection, either for fine-tuning or inference.

Theoretical Basis

Binary classification via constrained generation:

# Instead of generating arbitrary text, extract logits for specific tokens
logits = model(images, prompt)  # Full vocabulary logits at last position
binary_logits = logits[:, [ID_NO, ID_YES]]  # Extract only Yes/No scores
prediction = binary_logits.argmax(-1)  # 0=No, 1=Yes

This approach converts an open-ended generative model into a binary classifier by only examining the probability distribution over two specific tokens, avoiding the cost and variability of full text generation.

Related Pages

Implemented By

Implementation:Norrrrrrr_lyn_WAInjectBench_LlavaYesnoToken_Init

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment