Heuristic:Liu00222 Open Prompt Injection BPE Retokenization Parameters
| Knowledge Sources | |
|---|---|
| Domains | Security, NLP |
| Last Updated | 2026-02-14 15:30 GMT |
Overview
BPE retokenization defense uses a 10% dropout rate with up to 10 retry attempts, disrupting injected instructions through stochastic subword segmentation.
Description
The retokenization defense applies Byte Pair Encoding (BPE) with dropout to the user's data prompt before passing it to the LLM. By randomly dropping BPE merges (at a 10% rate), the text is retokenized in a non-standard way that breaks the structure of injected instructions while preserving the meaning of natural text. The defense requires a BPE merge table loaded from `./data/subword_nmt.voc`. Because the stochastic process can occasionally fail, the defense retries up to 10 times before falling back to the original text.
Usage
Use this heuristic when deploying the retokenization defense or understanding its failure modes. The defense is activated by setting `defense='retokenization'` in the application configuration. Key parameters to be aware of are the dropout rate (hardcoded at 0.1) and the retry count (hardcoded at 10).
The Insight (Rule of Thumb)
- Action: Set `defense='retokenization'` and ensure the BPE merge table exists at `./data/subword_nmt.voc`.
- Value: `bpe_dropout_rate=0.1`, sentinel configuration: `sentinels=[, '</w>']`, regime: `'end'`, bpe_symbol: `'@@'`.
- Trade-off: Low dropout (0.1) preserves text readability while slightly disrupting injection structure. Higher dropout would disrupt more but also corrupt legitimate text.
- Retry logic: The defense attempts retokenization up to 10 times. If all attempts fail, it logs a warning and returns the original (unprotected) text.
- No GPU required: This defense operates purely on CPU string manipulation.
Reasoning
From `apps/Application.py:96-98` (initialization):
elif self.defense == 'retokenization':
merge_table = load_subword_nmt_table('./data/subword_nmt.voc')
self.retokenizer = BpeOnlineTokenizer(bpe_dropout_rate=0.1, merge_table=merge_table)
From `apps/Application.py:172-179` (application with retry):
elif self.defense == 'retokenization':
for _ in range(10):
try:
return self.retokenizer(data_prompt, sentinels=['', '</w>'], regime='end', bpe_symbol='@@')
except:
continue
print(f'WARNING: unable to retokenize this sentence')
return data_prompt
The 10-retry loop with bare `except` catches all exceptions from the BPE tokenizer. This is a defensive pattern to handle edge cases where certain character sequences cause the tokenizer to fail.