Principle:Protectai Llm guard Code Snippet Detection

Knowledge Sources	Protectai_Llm_guard
Domains	Security, Code_Detection
Last Updated	2026-02-14 12:00 GMT

Overview

Binary classification of text as code versus natural language using transformer models trained on code corpora.

Description

Code Snippet Detection is a security principle that determines whether a given piece of text constitutes source code or natural language. This distinction is critical in scenarios where code execution, code injection, or code leakage must be prevented. A language model might be tricked into generating executable code, or a user might attempt to inject code into a prompt to manipulate system behavior.

The principle relies on transformer-based text classification using models specifically fine-tuned on the boundary between code and natural language. The CodeNLBERT architecture is particularly well-suited for this task because it was pre-trained on both programming language corpora and natural language text, giving it an inherent understanding of the syntactic and structural differences between the two domains.

A critical preprocessing step involves removing markdown artifacts such as triple-backtick fences, inline code markers, and language annotations. These artifacts could confuse the classifier by providing surface-level code indicators without actual code content, or conversely, by wrapping genuine code in a way that makes it appear as formatted natural language.

Usage

Use this principle when you need to enforce policies that prohibit code content in either user inputs or model outputs. Common scenarios include customer-facing chatbots that should only provide natural language explanations, content moderation systems that must prevent code injection attacks, and compliance systems where code generation is restricted. Apply to both input and output scanning for comprehensive protection.

Theoretical Basis

The classification algorithm proceeds as follows:

Preprocessing:

Strip markdown code block delimiters (triple backticks, language tags)
Remove inline code markers (single backticks)
Normalize whitespace while preserving structural indentation

Classification:

Tokenize the preprocessed text using the model's tokenizer
Pass tokens through the transformer encoder to obtain contextual representations
Apply a classification head (linear layer + softmax) over the [CLS] token representation
Output probability distribution over two classes: code and natural language

Decision:

Compare the code class probability against a configurable threshold
If P(code) >= threshold, the text is classified as code and flagged
The threshold allows tuning the trade-off between precision (avoiding false positives) and recall (catching all code)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment