Principle:Apache Spark YARN Web Proxy Security
| Knowledge Sources | |
|---|---|
| Domains | Security, YARN, Web_UI |
| Last Updated | 2026-02-08 22:00 GMT |
Overview
Security mechanism that restricts direct access to Spark application web UIs on YARN by routing all traffic through authorized YARN proxy hosts.
Description
YARN Web Proxy Security is the principle of isolating application web UIs in multi-tenant YARN clusters. When Spark runs on YARN, the Application Master's web UI should only be accessible through the YARN web proxy, not directly. This prevents unauthorized users from accessing application UIs, protects against cross-site scripting via the proxy's URL rewriting, and supports YARN ResourceManager High Availability (RM HA) failover. The security model works by: (1) maintaining a list of authorized proxy IP addresses, (2) validating each incoming request's source IP, (3) passing authorized requests through with the proxy user identity, and (4) redirecting unauthorized requests to the active YARN proxy.
Usage
This principle is automatically enforced when running Spark on YARN. It is relevant when configuring YARN RM HA, customizing proxy behavior, or debugging web UI access issues in multi-tenant deployments.
Theoretical Basis
The security model follows an IP-based allowlist with redirect pattern:
- Proxy IP Resolution: Periodically resolve proxy hostnames to IP addresses (cached for 5 minutes)
- Request Validation: Compare incoming request's remote address against the allowlist
- Identity Injection: For authorized requests, extract the proxy user from cookies and set as request principal
- Redirect Unauthorized: Redirect non-proxy requests to the active RM proxy URL with /redirect path marker
- HA Failover: Probe multiple RM URLs to find the active ResourceManager in HA configurations
Pseudo-code Logic:
# Abstract algorithm description
proxy_ips = resolve_all(proxy_hosts) # Cached, refreshed every 5 min
if request.remote_addr in proxy_ips:
user = extract_cookie(request, "proxy-user")
if user:
request = wrap_with_principal(request, user)
chain.doFilter(request, response)
else:
redirect_url = find_active_rm(rm_urls)
redirect(response, redirect_url + request.uri)