Implementation:Protectai Llm guard Output URLReachability

Knowledge Sources	Protectai_Llm_guard
Domains	URL_Validation, Output_Quality
Last Updated	2026-02-14 12:00 GMT

Overview

URLReachability is an output scanner that validates whether URLs found in LLM responses are reachable via HTTP requests.

Description

The URLReachability output scanner is not a thin wrapper; it has its own standalone implementation. It extracts all URLs from the LLM output and attempts to verify that each one is reachable by making HTTP requests. The scanner sends requests to each extracted URL and checks the HTTP response status code against a list of success_status_codes (defaulting to 200, 201, and 202). The timeout parameter controls how long to wait for each URL to respond before considering it unreachable. The is_reachable method provides a convenient way to test individual URLs. If any URL in the output is found to be unreachable, the output is flagged as invalid. Note that the source file has a typo in its name (url_reachabitlity.py), but the class is correctly named URLReachability.

Usage

Use this scanner when your LLM generates responses containing URLs that users might click on. This ensures that recommended links, references, and resources actually exist and are accessible. This is particularly important for documentation bots, research assistants, and customer support tools that provide links to resources, knowledge base articles, or product pages.

Code Reference

Source Location

Repository: Protectai_Llm_guard
File: llm_guard/output_scanners/url_reachabitlity.py
Lines: 1-57

Signature

class URLReachability(Scanner):
    def __init__(
        self,
        *,
        success_status_codes: list[int] | None = None,
        timeout: int = 5,
    ) -> None: ...

    def scan(self, prompt: str, output: str) -> tuple[str, bool, float]: ...

    def is_reachable(self, url: str) -> bool: ...

Import

from llm_guard.output_scanners import URLReachability

I/O Contract

Inputs

Name	Type	Required	Description
prompt	str	Yes	The input prompt
output	str	Yes	The LLM output to scan for unreachable URLs

Constructor Parameters

Name	Type	Required	Default	Description
success_status_codes	None	No	None	HTTP status codes considered successful (defaults to [200, 201, 202])
timeout	int	No	5	Timeout in seconds for each HTTP request

Outputs

Name	Type	Description
sanitized_output	str	The output (unmodified)
is_valid	bool	Whether all URLs in the output are reachable
risk_score	float	Risk score (-1.0 to 1.0)

Usage Examples

Basic Usage

from llm_guard.output_scanners import URLReachability

scanner = URLReachability(timeout=5)

prompt = "Give me some useful links"
output = "Check out https://www.example.com and https://www.python.org for more information."

sanitized_output, is_valid, risk_score = scanner.scan(prompt, output)

if is_valid:
    print("All URLs are reachable")
else:
    print(f"Some URLs are unreachable (risk: {risk_score})")

Custom Status Codes

from llm_guard.output_scanners import URLReachability

# Accept redirects as valid
scanner = URLReachability(
    success_status_codes=[200, 201, 202, 301, 302],
    timeout=10,
)

prompt = "Where can I find the documentation?"
output = "The documentation is at https://docs.example.com/latest"

sanitized_output, is_valid, risk_score = scanner.scan(prompt, output)
print(f"URLs reachable: {is_valid}")

Individual URL Check

from llm_guard.output_scanners import URLReachability

scanner = URLReachability()

# Check a single URL
url = "https://www.example.com"
reachable = scanner.is_reachable(url)
print(f"{url} is {'reachable' if reachable else 'unreachable'}")

Related Pages

Principle:Protectai_Llm_guard_URL_Reachability_Validation

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment