Implementation:Apache Spark Gen Sql Api Docs
| Knowledge Sources | |
|---|---|
| Domains | Documentation, SQL |
| Last Updated | 2026-02-08 22:00 GMT |
Overview
Python script that auto-generates markdown documentation for all Spark SQL built-in functions by extracting metadata from the JVM.
Description
gen-sql-api-docs.py launches a PySpark JVM gateway to retrieve function metadata (ExpressionInfo) from `PythonSQLUtils.listBuiltinFunctionInfos()`. It includes virtual operator definitions for special operators like `!=`, `<>`, `case`, and `||`. Functions are grouped by category (aggregation, string, math, etc.) with support for group merging (e.g., lambda_funcs into collection_funcs). It generates per-category markdown files with formatted usage, arguments, examples, notes, since versions, and deprecation info. An index page with a responsive CSS grid is created linking to all functions, along with an auto-generated `mkdocs.yml` navigation structure.
Usage
Use this script during the Spark documentation build process to regenerate the SQL function reference pages. It is invoked as part of the `sql/create-docs.sh` pipeline and requires a working Spark build with PySpark available.
Code Reference
Source Location
- Repository: Apache_Spark
- File: sql/gen-sql-api-docs.py
- Lines: 1-550
Signature
ExpressionInfo = namedtuple(
"ExpressionInfo",
"className name usage arguments examples note since deprecated group"
)
def _list_function_infos(jvm):
"""Retrieve all built-in function metadata from the JVM gateway."""
def _make_anchor(name):
"""Convert function name to a valid HTML anchor."""
def _get_display_name(group):
"""Convert group name to display name."""
def _generate_function_md(func_info, anchor):
"""Generate markdown documentation for a single function."""
def _generate_group_page(group_name, functions, output_dir):
"""Generate a full markdown page for a function category."""
def _generate_index_page(groups, output_dir):
"""Generate the index page with CSS grid linking all functions."""
def _generate_mkdocs_nav(groups, output_dir):
"""Generate mkdocs.yml navigation structure."""
Import
# Standalone CLI script - invoked directly
python sql/gen-sql-api-docs.py --output-dir /path/to/output
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| JVM Gateway | PySpark JVM | Yes | Running PySpark gateway for accessing ExpressionInfo |
| output-dir | CLI argument | Yes | Directory path for generated markdown files |
Outputs
| Name | Type | Description |
|---|---|---|
| Category markdown files | .md files | One file per function category (e.g., agg_funcs.md, string_funcs.md) |
| Index page | index.md | Overview page with CSS grid linking to all function categories |
| mkdocs.yml | YAML config | MkDocs navigation structure for the generated pages |
Usage Examples
Generate SQL Function Docs
# Typically invoked via the doc build pipeline
cd $SPARK_HOME
sql/create-docs.sh
# Or directly:
python sql/gen-sql-api-docs.py --output-dir docs/sql-ref-functions/