Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark Gen Sql Api Docs

From Leeroopedia


Knowledge Sources
Domains Documentation, SQL
Last Updated 2026-02-08 22:00 GMT

Overview

Python script that auto-generates markdown documentation for all Spark SQL built-in functions by extracting metadata from the JVM.

Description

gen-sql-api-docs.py launches a PySpark JVM gateway to retrieve function metadata (ExpressionInfo) from `PythonSQLUtils.listBuiltinFunctionInfos()`. It includes virtual operator definitions for special operators like `!=`, `<>`, `case`, and `||`. Functions are grouped by category (aggregation, string, math, etc.) with support for group merging (e.g., lambda_funcs into collection_funcs). It generates per-category markdown files with formatted usage, arguments, examples, notes, since versions, and deprecation info. An index page with a responsive CSS grid is created linking to all functions, along with an auto-generated `mkdocs.yml` navigation structure.

Usage

Use this script during the Spark documentation build process to regenerate the SQL function reference pages. It is invoked as part of the `sql/create-docs.sh` pipeline and requires a working Spark build with PySpark available.

Code Reference

Source Location

Signature

ExpressionInfo = namedtuple(
    "ExpressionInfo",
    "className name usage arguments examples note since deprecated group"
)

def _list_function_infos(jvm):
    """Retrieve all built-in function metadata from the JVM gateway."""

def _make_anchor(name):
    """Convert function name to a valid HTML anchor."""

def _get_display_name(group):
    """Convert group name to display name."""

def _generate_function_md(func_info, anchor):
    """Generate markdown documentation for a single function."""

def _generate_group_page(group_name, functions, output_dir):
    """Generate a full markdown page for a function category."""

def _generate_index_page(groups, output_dir):
    """Generate the index page with CSS grid linking all functions."""

def _generate_mkdocs_nav(groups, output_dir):
    """Generate mkdocs.yml navigation structure."""

Import

# Standalone CLI script - invoked directly
python sql/gen-sql-api-docs.py --output-dir /path/to/output

I/O Contract

Inputs

Name Type Required Description
JVM Gateway PySpark JVM Yes Running PySpark gateway for accessing ExpressionInfo
output-dir CLI argument Yes Directory path for generated markdown files

Outputs

Name Type Description
Category markdown files .md files One file per function category (e.g., agg_funcs.md, string_funcs.md)
Index page index.md Overview page with CSS grid linking to all function categories
mkdocs.yml YAML config MkDocs navigation structure for the generated pages

Usage Examples

Generate SQL Function Docs

# Typically invoked via the doc build pipeline
cd $SPARK_HOME
sql/create-docs.sh

# Or directly:
python sql/gen-sql-api-docs.py --output-dir docs/sql-ref-functions/

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment