Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Protectai Llm guard URL Reachability Validation

From Leeroopedia
Knowledge Sources
Domains URL_Validation, Output_Quality
Last Updated 2026-02-14 12:00 GMT

Overview

Verifying that URLs referenced in generated text are reachable and return successful HTTP status codes.

Description

Large language models frequently generate URLs that do not exist, have expired, or return error responses. This is a common form of hallucination where the model produces plausible-looking URLs that fail upon navigation. This principle provides a guardrail that extracts all URLs from the generated output and verifies their reachability through live HTTP requests.

Each extracted URL is tested via an HTTP GET request with a configurable timeout period. The response status code is checked against a set of acceptable success codes -- by default 200 (OK), 201 (Created), and 202 (Accepted). URLs that fail to respond within the timeout, return error status codes (4xx or 5xx), or cause connection errors are flagged as unreachable.

This validation ensures that any links presented to the user in the LLM output will actually resolve to accessible content, improving the reliability and trustworthiness of the generated response.

Usage

Apply this principle when the generated output contains URLs that users are likely to follow:

  • Research assistants that provide reference links or citations.
  • Code generation tools that reference documentation URLs.
  • Content creation systems that embed hyperlinks in generated text.
  • Customer support bots that direct users to help pages or resources.
  • Any scenario where broken links would undermine user trust.

Theoretical Basis

The URL reachability validation pipeline operates as follows:

1. Apply a regex pattern to extract all URLs from the output text.
2. For each extracted URL:
   a. Issue an HTTP GET request with the configured timeout.
   b. Handle potential failure modes:
      - Connection timeout: flag as unreachable.
      - DNS resolution failure: flag as unreachable.
      - SSL/TLS errors: flag as unreachable.
      - Connection refused: flag as unreachable.
   c. If a response is received, check the HTTP status code:
      - If status_code is in the set of success codes (default: {200, 201, 202}),
        mark as reachable.
      - Otherwise, flag as unreachable.
3. Aggregate results across all extracted URLs.
4. If any URL is flagged as unreachable, mark the overall output as failing validation.
5. Return the list of URLs with their reachability status and HTTP status codes.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment