Implementation:Protectai Llm guard Input Language

Knowledge Sources	Protectai_Llm_guard
Domains	Language_Detection, NLP
Last Updated	2026-02-14 12:00 GMT

Overview

The Language scanner validates that prompts are written in allowed languages using XLM-RoBERTa language detection.

Description

Language is an input scanner that identifies the natural language of a prompt using the papluca/xlm-roberta-base-language-detection model, a fine-tuned XLM-RoBERTa model for language identification. The scanner takes a list of valid_languages specified as ISO 639-1 language codes (e.g., "en" for English, "fr" for French, "de" for German) and flags prompts that are not in one of the allowed languages. The threshold parameter (default 0.6) controls the minimum confidence required for the language classification. The scanner supports both FULL text analysis and sentence-level analysis via the match_type parameter. ONNX runtime support is available for optimized inference.

Usage

Use the Language scanner when you need to restrict prompts to specific languages. This is important for applications that only support certain languages, for enforcing language policies in multilingual environments, and for detecting language-based evasion attempts where users switch to unsupported languages to bypass other content filters.

Code Reference

Source Location

Repository: Protectai_Llm_guard
File: llm_guard/input_scanners/language.py
Lines: 1-114

Signature

class Language(Scanner):
    def __init__(
        self,
        valid_languages: list[str],
        *,
        model: Model | None = None,  # default: papluca/xlm-roberta-base-language-detection
        threshold: float = 0.6,
        match_type: MatchType | str = MatchType.FULL,
        use_onnx: bool = False,
    ) -> None: ...

    def scan(self, prompt: str) -> tuple[str, bool, float]: ...

Import

from llm_guard.input_scanners import Language

I/O Contract

Inputs

Name	Type	Required	Description
valid_languages	list[str]	Yes	List of allowed language codes in ISO 639-1 format (e.g., ["en", "fr", "de"]).
model	Model or None	No	The language detection model. Defaults to papluca/xlm-roberta-base-language-detection.
threshold	float	No	Minimum confidence score for language classification. Defaults to 0.6.
match_type	MatchType or str	No	Whether to analyze the full text or individual sentences. Defaults to MatchType.FULL.
use_onnx	bool	No	Whether to use ONNX runtime for inference. Defaults to False.

scan() Inputs

Name	Type	Required	Description
prompt	str	Yes	The input text whose language will be detected.

Outputs

Name	Type	Description
prompt	str	The original prompt (unchanged).
is_valid	bool	True if the detected language is in the valid_languages list; False otherwise.
risk_score	float	The confidence score of the detected language classification.

Usage Examples

Basic Usage

from llm_guard.input_scanners import Language

scanner = Language(
    valid_languages=["en"],
)
prompt = "What is the capital of France?"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # True (English detected, which is in valid list)
print(risk_score)  # Confidence score

Multiple Languages

from llm_guard.input_scanners import Language

# Allow English, French, and Spanish
scanner = Language(
    valid_languages=["en", "fr", "es"],
    threshold=0.5,
)
prompt = "Quelle est la capitale de la France?"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # True (French is in the valid list)

Blocking Non-English Input

from llm_guard.input_scanners import Language

scanner = Language(
    valid_languages=["en"],
    threshold=0.6,
)
prompt = "Was ist die Hauptstadt von Frankreich?"
sanitized_prompt, is_valid, risk_score = scanner.scan(prompt)

print(is_valid)    # False (German detected, not in valid list)
print(risk_score)  # Confidence score for the German classification

Related Pages

Principle:Protectai_Llm_guard_Language_Detection

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment