Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Kserve Kserve Verify Doc Links

From Leeroopedia
Knowledge Sources
Domains CI/CD, Documentation, Quality Assurance
Last Updated 2026-02-13 00:00 GMT

Overview

This Python script scans all Markdown files in the KServe repository for hyperlinks and verifies that each referenced resource exists, reporting any broken links.

Description

The script discovers Markdown files via glob patterns, extracts links using regex (both markdown-style [text](url) and plain URLs), resolves relative links against the GitHub repository URL, verifies local file links by checking the filesystem, and validates remote URLs concurrently using HTTP HEAD/GET requests. It includes retry logic for rate limiting (HTTP 429) and transient errors, respects GitHub's 60-request-per-minute rate limit, and exits with a non-zero code if any 404 errors are found. This makes it suitable for CI pipelines to prevent broken documentation links from accumulating.

Usage

Run this script in CI pipelines or locally to verify that all documentation links in the KServe repository are valid. It is located in the hack/ directory alongside other development and CI utility scripts.

Code Reference

Source Location

Signature

#!/usr/bin/env python3

import concurrent.futures
import itertools
import re
from datetime import datetime, timedelta
from glob import glob
from os import environ as env
from os.path import abspath, dirname, exists, relpath
from time import sleep
from urllib.request import Request, urlopen
from urllib.parse import urlparse
from urllib.error import URLError, HTTPError

GITHUB_REPO = env.get("GITHUB_REPO", "https://github.com/kserve/kserve/")
BRANCH = "master"

def find_md_files() -> [str]: ...
def get_links_from_md_file(md_file_path: str) -> [(int, str, str)]: ...
def test_url(file: str, line: int, text: str, url: str) -> (str, int, str, str, int): ...
def wait_before_retry(retry_time: datetime) -> datetime: ...
def set_retry_time() -> datetime: ...
def request_url(url: str, method: str = "HEAD", headers: dict = None, timeout: int = 10) -> int: ...
def verify_urls_concurrently(md_files: [str]) -> [(str, int, str, str, int)]: ...
def verify_doc_links() -> int: ...

Import

python3 hack/verify-doc-links.py

I/O Contract

Inputs

Input Type Description
Markdown files filesystem All .md files found via glob patterns /**/*.md and /.github/**/*.md
GITHUB_REPO env var GitHub repository URL (default: https://github.com/kserve/kserve/)

Outputs

Output Type Description
stdout text Progress messages and broken link reports
exit code int 0 if all links are valid, non-zero if broken links are found

Excluded Paths

Path Reason
/node_modules/ Third-party dependencies
/temp/ Temporary files
/.venv/ Python virtual environment

Excluded URL Patterns

Pattern Reason
URLs with <, >, $, {, } Placeholder/template URLs
0.0.0.0, localhost, :80, :90 Local/non-public URLs
example.com, customdomain.com Example domains
svc.cluster.local Kubernetes internal DNS

Key Functions

Function Description
find_md_files() Discovers all Markdown files using glob, excluding paths in excluded_paths
get_links_from_md_file() Extracts links from a Markdown file using regex; resolves relative links to GitHub URLs
test_url() Tests a single URL with HEAD, falling back to GET; handles 403, 405, 429, 503 with retries
request_url() Makes an HTTP request with configurable method, headers, and timeout
verify_urls_concurrently() Uses ThreadPoolExecutor with up to 60 parallel workers
verify_doc_links() Main orchestrator: finds files, verifies URLs, reports broken links

Rate Limiting Configuration

Parameter Value Description
parallel_requests 60 Maximum concurrent requests
retry_wait 60 seconds Wait time after HTTP 429
extra_wait 5 seconds Additional buffer before retry

Usage Examples

# Run from the repository root
python3 hack/verify-doc-links.py

# Override the GitHub repository URL
GITHUB_REPO=https://github.com/my-fork/kserve/ python3 hack/verify-doc-links.py

# Use in CI (non-zero exit on broken links)
python3 hack/verify-doc-links.py || echo "Broken links detected!"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment