Implementation:Protectai Llm guard Input Code

Knowledge Sources	Protectai_Llm_guard
Domains	Code_Detection, Content_Filtering
Last Updated	2026-02-14 12:00 GMT

Overview

The Code scanner detects programming languages in prompts, allowing you to block or allow specific languages.

Description

Code is an input scanner that identifies the programming language of code snippets within prompts using the philomath-1209/programming-language-identification classification model. Unlike BanCode which simply detects the presence of any code, this scanner identifies the specific language and can be configured to either block certain languages (is_blocked=True) or allow only certain languages (is_blocked=False). It supports detection of 26 programming languages including Python, JavaScript, C++, Java, Go, Rust, Ruby, PHP, TypeScript, Swift, Kotlin, and more. The threshold parameter (default 0.5) controls the minimum confidence for language classification. ONNX runtime support is available for faster inference.

Usage

Use the Code scanner when you need fine-grained control over which programming languages are permitted in prompts. This is useful for restricting code submissions to approved languages, blocking potentially dangerous languages (e.g., shell scripts), or ensuring prompts only contain code in languages relevant to your application.

Code Reference

Source Location

Repository: Protectai_Llm_guard
File: llm_guard/input_scanners/code.py
Lines: 1-178

Signature

class Code(Scanner):
    def __init__(
        self,
        languages: list[str],
        *,
        model: Model | None = None,  # default: philomath-1209/programming-language-identification
        is_blocked: bool = True,
        threshold: float = 0.5,
        use_onnx: bool = False,
    ) -> None: ...

    def scan(self, prompt: str) -> tuple[str, bool, float]: ...

Import

from llm_guard.input_scanners import Code

I/O Contract

Inputs

Name	Type	Required	Description
languages	list[str]	Yes	List of programming language names to block or allow.
model	Model or None	No	The language identification model to use. Defaults to philomath-1209/programming-language-identification.
is_blocked	bool	No	If True, listed languages are blocked; if False, only listed languages are allowed. Defaults to True.
threshold	float	No	Minimum confidence score for language identification. Defaults to 0.5.
use_onnx	bool	No	Whether to use ONNX runtime for inference. Defaults to False.

scan() Inputs

Name	Type	Required	Description
prompt	str	Yes	The input text to scan for programming language code.

Outputs

Name	Type	Description
prompt	str	The original prompt (unchanged).
is_valid	bool	True if the prompt passes the language filter; False otherwise.
risk_score	float	The highest classification confidence for any detected language.

Supported Languages

The scanner supports identification of 26 programming languages including:

Python	JavaScript	Java	C++	C#	Go
Rust	Ruby	PHP	TypeScript	Swift	Kotlin
Scala	R	Perl	Haskell	Lua	Shell
SQL	HTML	CSS	Markdown	MATLAB	Dart
Objective-C	Assembly

Usage Examples

Block Specific Languages

from llm_guard.input_scanners import Code

# Block shell scripts and SQL
scanner = Code(
    languages=["Shell", "SQL"],
    is_blocked=True,
    threshold=0.5,
)
prompt = "#!/bin/bash\nrm -rf /important/data"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # False (Shell code detected and blocked)
print(risk_score)  # Classification confidence

Allow Only Specific Languages

from llm_guard.input_scanners import Code

# Only allow Python and JavaScript
scanner = Code(
    languages=["Python", "JavaScript"],
    is_blocked=False,
    threshold=0.5,
)
prompt = "def fibonacci(n):\n    if n <= 1: return n\n    return fibonacci(n-1) + fibonacci(n-2)"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # True (Python is in the allowed list)

Related Pages

Principle:Protectai_Llm_guard_Programming_Language_Detection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment