Overview
The Code scanner detects programming languages in prompts, allowing you to block or allow specific languages.
Description
Code is an input scanner that identifies the programming language of code snippets within prompts using the philomath-1209/programming-language-identification classification model. Unlike BanCode which simply detects the presence of any code, this scanner identifies the specific language and can be configured to either block certain languages (is_blocked=True) or allow only certain languages (is_blocked=False). It supports detection of 26 programming languages including Python, JavaScript, C++, Java, Go, Rust, Ruby, PHP, TypeScript, Swift, Kotlin, and more. The threshold parameter (default 0.5) controls the minimum confidence for language classification. ONNX runtime support is available for faster inference.
Usage
Use the Code scanner when you need fine-grained control over which programming languages are permitted in prompts. This is useful for restricting code submissions to approved languages, blocking potentially dangerous languages (e.g., shell scripts), or ensuring prompts only contain code in languages relevant to your application.
Code Reference
Source Location
Signature
class Code(Scanner):
def __init__(
self,
languages: list[str],
*,
model: Model | None = None, # default: philomath-1209/programming-language-identification
is_blocked: bool = True,
threshold: float = 0.5,
use_onnx: bool = False,
) -> None: ...
def scan(self, prompt: str) -> tuple[str, bool, float]: ...
Import
from llm_guard.input_scanners import Code
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| languages |
list[str] |
Yes |
List of programming language names to block or allow.
|
| model |
Model or None |
No |
The language identification model to use. Defaults to philomath-1209/programming-language-identification.
|
| is_blocked |
bool |
No |
If True, listed languages are blocked; if False, only listed languages are allowed. Defaults to True.
|
| threshold |
float |
No |
Minimum confidence score for language identification. Defaults to 0.5.
|
| use_onnx |
bool |
No |
Whether to use ONNX runtime for inference. Defaults to False.
|
scan() Inputs
| Name |
Type |
Required |
Description
|
| prompt |
str |
Yes |
The input text to scan for programming language code.
|
Outputs
| Name |
Type |
Description
|
| prompt |
str |
The original prompt (unchanged).
|
| is_valid |
bool |
True if the prompt passes the language filter; False otherwise.
|
| risk_score |
float |
The highest classification confidence for any detected language.
|
Supported Languages
The scanner supports identification of 26 programming languages including:
| Python |
JavaScript |
Java |
C++ |
C# |
Go
|
| Rust |
Ruby |
PHP |
TypeScript |
Swift |
Kotlin
|
| Scala |
R |
Perl |
Haskell |
Lua |
Shell
|
| SQL |
HTML |
CSS |
Markdown |
MATLAB |
Dart
|
| Objective-C |
Assembly |
|
Usage Examples
Block Specific Languages
from llm_guard.input_scanners import Code
# Block shell scripts and SQL
scanner = Code(
languages=["Shell", "SQL"],
is_blocked=True,
threshold=0.5,
)
prompt = "#!/bin/bash\nrm -rf /important/data"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
print(is_valid) # False (Shell code detected and blocked)
print(risk_score) # Classification confidence
Allow Only Specific Languages
from llm_guard.input_scanners import Code
# Only allow Python and JavaScript
scanner = Code(
languages=["Python", "JavaScript"],
is_blocked=False,
threshold=0.5,
)
prompt = "def fibonacci(n):\n if n <= 1: return n\n return fibonacci(n-1) + fibonacci(n-2)"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)
print(is_valid) # True (Python is in the allowed list)
Related Pages