Principle:Protectai Llm guard Code Snippet Detection
| Knowledge Sources | |
|---|---|
| Domains | Security, Code_Detection |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Binary classification of text as code versus natural language using transformer models trained on code corpora.
Description
Code Snippet Detection is a security principle that determines whether a given piece of text constitutes source code or natural language. This distinction is critical in scenarios where code execution, code injection, or code leakage must be prevented. A language model might be tricked into generating executable code, or a user might attempt to inject code into a prompt to manipulate system behavior.
The principle relies on transformer-based text classification using models specifically fine-tuned on the boundary between code and natural language. The CodeNLBERT architecture is particularly well-suited for this task because it was pre-trained on both programming language corpora and natural language text, giving it an inherent understanding of the syntactic and structural differences between the two domains.
A critical preprocessing step involves removing markdown artifacts such as triple-backtick fences, inline code markers, and language annotations. These artifacts could confuse the classifier by providing surface-level code indicators without actual code content, or conversely, by wrapping genuine code in a way that makes it appear as formatted natural language.
Usage
Use this principle when you need to enforce policies that prohibit code content in either user inputs or model outputs. Common scenarios include customer-facing chatbots that should only provide natural language explanations, content moderation systems that must prevent code injection attacks, and compliance systems where code generation is restricted. Apply to both input and output scanning for comprehensive protection.
Theoretical Basis
The classification algorithm proceeds as follows:
Preprocessing:
- Strip markdown code block delimiters (triple backticks, language tags)
- Remove inline code markers (single backticks)
- Normalize whitespace while preserving structural indentation
Classification:
- Tokenize the preprocessed text using the model's tokenizer
- Pass tokens through the transformer encoder to obtain contextual representations
- Apply a classification head (linear layer + softmax) over the [CLS] token representation
- Output probability distribution over two classes: code and natural language
Decision:
- Compare the code class probability against a configurable threshold
- If P(code) >= threshold, the text is classified as code and flagged
- The threshold allows tuning the trade-off between precision (avoiding false positives) and recall (catching all code)