Principle:Protectai Llm guard Programming Language Detection
| Knowledge Sources | |
|---|---|
| Domains | Code_Detection, Content_Filtering |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
Identifying programming languages in text to enforce allow or block policies on code content.
Description
Programming Language Detection is a content filtering principle that classifies code content by programming language and enforces policies based on which languages are permitted or prohibited. Unlike binary code detection (which simply distinguishes code from natural language), this principle performs fine-grained multi-class classification across 26 programming languages, enabling nuanced policies about what types of code are acceptable.
The detection pipeline first extracts code blocks from markdown-formatted text, isolating the actual code content from surrounding natural language. These extracted code segments are then passed through a multi-class text classification model trained specifically on programming language identification. The model assigns probability scores to each of the 26 supported languages.
The principle supports two complementary policy modes. In allowlist mode, only code in explicitly permitted languages passes validation. In blocklist mode, code in specified languages is blocked while all others are allowed. This flexibility accommodates diverse use cases from security-focused deployments (blocking shell scripts) to educational contexts (allowing only specific languages).
Usage
Use this principle when you need granular control over which programming languages may appear in text processed by a language model. Common scenarios include restricting code generation to approved languages in enterprise environments, blocking potentially dangerous languages (e.g., shell scripts, SQL) in security-sensitive contexts, enforcing language standards in educational platforms, and filtering model outputs to ensure generated code matches the user's requested language.
Theoretical Basis
The detection algorithm operates as follows:
Code Extraction:
- Parse the input text for markdown code blocks (triple-backtick fenced blocks)
- Extract the content of each code block, stripping fence delimiters
- If no markdown code blocks are found, treat the entire text as a potential code segment
Language Classification:
- Tokenize each extracted code segment using the model's tokenizer
- Pass tokens through the transformer encoder
- Apply a multi-class classification head with softmax over 26 language classes
- Each code segment receives a probability distribution over all supported languages
Policy Enforcement:
- Determine the predicted language as argmax of the probability distribution
- In allowlist mode: flag if the predicted language is NOT in the allowed set
- In blocklist mode: flag if the predicted language IS in the blocked set
- Apply a confidence threshold to avoid flagging low-confidence predictions