Overview
URLReachability is an output scanner that validates whether URLs found in LLM responses are reachable via HTTP requests.
Description
The URLReachability output scanner is not a thin wrapper; it has its own standalone implementation. It extracts all URLs from the LLM output and attempts to verify that each one is reachable by making HTTP requests. The scanner sends requests to each extracted URL and checks the HTTP response status code against a list of success_status_codes (defaulting to 200, 201, and 202). The timeout parameter controls how long to wait for each URL to respond before considering it unreachable. The is_reachable method provides a convenient way to test individual URLs. If any URL in the output is found to be unreachable, the output is flagged as invalid. Note that the source file has a typo in its name (url_reachabitlity.py), but the class is correctly named URLReachability.
Usage
Use this scanner when your LLM generates responses containing URLs that users might click on. This ensures that recommended links, references, and resources actually exist and are accessible. This is particularly important for documentation bots, research assistants, and customer support tools that provide links to resources, knowledge base articles, or product pages.
Code Reference
Source Location
Signature
class URLReachability(Scanner):
def __init__(
self,
*,
success_status_codes: list[int] | None = None,
timeout: int = 5,
) -> None: ...
def scan(self, prompt: str, output: str) -> tuple[str, bool, float]: ...
def is_reachable(self, url: str) -> bool: ...
Import
from llm_guard.output_scanners import URLReachability
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| prompt |
str |
Yes |
The input prompt
|
| output |
str |
Yes |
The LLM output to scan for unreachable URLs
|
Constructor Parameters
| Name |
Type |
Required |
Default |
Description
|
| success_status_codes |
None |
No |
None |
HTTP status codes considered successful (defaults to [200, 201, 202])
|
| timeout |
int |
No |
5 |
Timeout in seconds for each HTTP request
|
Outputs
| Name |
Type |
Description
|
| sanitized_output |
str |
The output (unmodified)
|
| is_valid |
bool |
Whether all URLs in the output are reachable
|
| risk_score |
float |
Risk score (-1.0 to 1.0)
|
Usage Examples
Basic Usage
from llm_guard.output_scanners import URLReachability
scanner = URLReachability(timeout=5)
prompt = "Give me some useful links"
output = "Check out https://www.example.com and https://www.python.org for more information."
sanitized_output, is_valid, risk_score = scanner.scan(prompt, output)
if is_valid:
print("All URLs are reachable")
else:
print(f"Some URLs are unreachable (risk: {risk_score})")
Custom Status Codes
from llm_guard.output_scanners import URLReachability
# Accept redirects as valid
scanner = URLReachability(
success_status_codes=[200, 201, 202, 301, 302],
timeout=10,
)
prompt = "Where can I find the documentation?"
output = "The documentation is at https://docs.example.com/latest"
sanitized_output, is_valid, risk_score = scanner.scan(prompt, output)
print(f"URLs reachable: {is_valid}")
Individual URL Check
from llm_guard.output_scanners import URLReachability
scanner = URLReachability()
# Check a single URL
url = "https://www.example.com"
reachable = scanner.is_reachable(url)
print(f"{url} is {'reachable' if reachable else 'unreachable'}")
Related Pages