Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Apache Spark YARN Web Proxy Security

From Leeroopedia


Knowledge Sources
Domains Security, YARN, Web_UI
Last Updated 2026-02-08 22:00 GMT

Overview

Security mechanism that restricts direct access to Spark application web UIs on YARN by routing all traffic through authorized YARN proxy hosts.

Description

YARN Web Proxy Security is the principle of isolating application web UIs in multi-tenant YARN clusters. When Spark runs on YARN, the Application Master's web UI should only be accessible through the YARN web proxy, not directly. This prevents unauthorized users from accessing application UIs, protects against cross-site scripting via the proxy's URL rewriting, and supports YARN ResourceManager High Availability (RM HA) failover. The security model works by: (1) maintaining a list of authorized proxy IP addresses, (2) validating each incoming request's source IP, (3) passing authorized requests through with the proxy user identity, and (4) redirecting unauthorized requests to the active YARN proxy.

Usage

This principle is automatically enforced when running Spark on YARN. It is relevant when configuring YARN RM HA, customizing proxy behavior, or debugging web UI access issues in multi-tenant deployments.

Theoretical Basis

The security model follows an IP-based allowlist with redirect pattern:

  1. Proxy IP Resolution: Periodically resolve proxy hostnames to IP addresses (cached for 5 minutes)
  2. Request Validation: Compare incoming request's remote address against the allowlist
  3. Identity Injection: For authorized requests, extract the proxy user from cookies and set as request principal
  4. Redirect Unauthorized: Redirect non-proxy requests to the active RM proxy URL with /redirect path marker
  5. HA Failover: Probe multiple RM URLs to find the active ResourceManager in HA configurations

Pseudo-code Logic:

# Abstract algorithm description
proxy_ips = resolve_all(proxy_hosts)  # Cached, refreshed every 5 min
if request.remote_addr in proxy_ips:
    user = extract_cookie(request, "proxy-user")
    if user:
        request = wrap_with_principal(request, user)
    chain.doFilter(request, response)
else:
    redirect_url = find_active_rm(rm_urls)
    redirect(response, redirect_url + request.uri)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment