Implementation:Apache Spark Build Api Docs
| Knowledge Sources | |
|---|---|
| Domains | Documentation |
| Type | API Doc |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Jekyll plugin and associated scripts that orchestrate building API documentation for all Spark languages.
Description
build_api_docs.rb is a Jekyll plugin that orchestrates documentation generation during the Jekyll build. It calls separate methods for each language: build_scala_and_java_docs (Scaladoc + Javadoc via SBT unidoc), build_python_docs (Sphinx + mkdocs), build_r_docs (pkgdown), build_sql_docs (custom Python scripts), and build_error_docs. Skip controls (SKIP_API, SKIP_SCALADOC, SKIP_PYTHONDOC, SKIP_RDOC, SKIP_SQLDOC, SKIP_ERRORDOC) allow selective generation.
The plugin integrates into the Jekyll build lifecycle, ensuring that API documentation is generated as part of the overall site build. Each documentation method handles:
- Tool invocation: Calling the appropriate language-specific documentation generator with the correct configuration.
- Output placement: Placing generated documentation in the correct subdirectory under
docs/_site/api/. - Error handling: Detecting and reporting build failures for individual documentation components.
The SQL documentation generator (sql/create-docs.sh) is a separate shell script that invokes custom Python scripts to generate SQL function reference documentation from Spark's SQL function registry.
Usage
Invoked automatically by Jekyll during documentation builds. Use skip flags for faster development iterations when only one language's documentation needs to be regenerated.
Code Reference
Source Location
- Repository: apache/spark
- Files:
docs/_plugins/build_api_docs.rb(lines 1-238)sql/create-docs.sh(lines 1-59)
Methods
| Method | Lines | Description | Tool Used |
|---|---|---|---|
build_scala_and_java_docs |
135-155 | Generates Scala and Java API documentation | SBT unidoc |
build_python_docs |
157-172 | Generates PySpark API documentation | Sphinx + mkdocs |
build_r_docs |
174-187 | Generates SparkR API documentation | R pkgdown |
build_sql_docs |
189-204 | Generates SQL function reference | Custom Python scripts |
build_error_docs |
206-215 | Generates error code reference | Custom scripts |
Skip Environment Variables
| Variable | Effect |
|---|---|
SKIP_API |
Skip all API documentation generation |
SKIP_SCALADOC |
Skip Scala and Java API docs |
SKIP_PYTHONDOC |
Skip Python API docs |
SKIP_RDOC |
Skip R API docs |
SKIP_SQLDOC |
Skip SQL reference docs |
SKIP_ERRORDOC |
Skip error code docs |
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| Compiled Spark source | build artifacts | Yes | Spark must be compiled before API docs can be generated |
| Documentation source | docs/ directory |
Yes | Jekyll site source, Markdown pages, configuration |
| Skip environment variables | environment | No | Controls which documentation components to generate |
Outputs
| Name | Type | Description |
|---|---|---|
| Complete site | docs/_site/ |
Full documentation site including all guides and references |
| Scala API docs | api/scala/ |
Scaladoc-generated Scala API reference |
| Java API docs | api/java/ |
Javadoc-generated Java API reference |
| Python API docs | api/python/ |
Sphinx/mkdocs-generated PySpark API reference |
| R API docs | api/R/ |
pkgdown-generated SparkR API reference |
| SQL docs | api/sql/ |
SQL function reference documentation |
Usage Examples
Full Documentation Build
# Build complete documentation site (production mode)
cd docs && PRODUCTION=1 bundle exec jekyll build
Skip Python Documentation
# Build everything except Python API docs
cd docs && SKIP_PYTHONDOC=1 bundle exec jekyll build
Build Only SQL Documentation
# Generate SQL function reference documentation
sql/create-docs.sh
Development Iteration (Skip All API Docs)
# Build site pages only, skipping all API documentation
cd docs && SKIP_API=1 bundle exec jekyll build