Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Protectai Llm guard Input Language

From Leeroopedia
Knowledge Sources
Domains Language_Detection, NLP
Last Updated 2026-02-14 12:00 GMT

Overview

The Language scanner validates that prompts are written in allowed languages using XLM-RoBERTa language detection.

Description

Language is an input scanner that identifies the natural language of a prompt using the papluca/xlm-roberta-base-language-detection model, a fine-tuned XLM-RoBERTa model for language identification. The scanner takes a list of valid_languages specified as ISO 639-1 language codes (e.g., "en" for English, "fr" for French, "de" for German) and flags prompts that are not in one of the allowed languages. The threshold parameter (default 0.6) controls the minimum confidence required for the language classification. The scanner supports both FULL text analysis and sentence-level analysis via the match_type parameter. ONNX runtime support is available for optimized inference.

Usage

Use the Language scanner when you need to restrict prompts to specific languages. This is important for applications that only support certain languages, for enforcing language policies in multilingual environments, and for detecting language-based evasion attempts where users switch to unsupported languages to bypass other content filters.

Code Reference

Source Location

Signature

class Language(Scanner):
    def __init__(
        self,
        valid_languages: list[str],
        *,
        model: Model | None = None,  # default: papluca/xlm-roberta-base-language-detection
        threshold: float = 0.6,
        match_type: MatchType | str = MatchType.FULL,
        use_onnx: bool = False,
    ) -> None: ...

    def scan(self, prompt: str) -> tuple[str, bool, float]: ...

Import

from llm_guard.input_scanners import Language

I/O Contract

Inputs

Name Type Required Description
valid_languages list[str] Yes List of allowed language codes in ISO 639-1 format (e.g., ["en", "fr", "de"]).
model Model or None No The language detection model. Defaults to papluca/xlm-roberta-base-language-detection.
threshold float No Minimum confidence score for language classification. Defaults to 0.6.
match_type MatchType or str No Whether to analyze the full text or individual sentences. Defaults to MatchType.FULL.
use_onnx bool No Whether to use ONNX runtime for inference. Defaults to False.

scan() Inputs

Name Type Required Description
prompt str Yes The input text whose language will be detected.

Outputs

Name Type Description
prompt str The original prompt (unchanged).
is_valid bool True if the detected language is in the valid_languages list; False otherwise.
risk_score float The confidence score of the detected language classification.

Usage Examples

Basic Usage

from llm_guard.input_scanners import Language

scanner = Language(
    valid_languages=["en"],
)
prompt = "What is the capital of France?"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # True (English detected, which is in valid list)
print(risk_score)  # Confidence score

Multiple Languages

from llm_guard.input_scanners import Language

# Allow English, French, and Spanish
scanner = Language(
    valid_languages=["en", "fr", "es"],
    threshold=0.5,
)
prompt = "Quelle est la capitale de la France?"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # True (French is in the valid list)

Blocking Non-English Input

from llm_guard.input_scanners import Language

scanner = Language(
    valid_languages=["en"],
    threshold=0.6,
)
prompt = "Was ist die Hauptstadt von Frankreich?"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # False (German detected, not in valid list)
print(risk_score)  # Confidence score for the German classification

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment