Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark Build Api Docs

From Leeroopedia


Knowledge Sources
Domains Documentation
Type API Doc
Last Updated 2026-02-08 12:00 GMT

Overview

Jekyll plugin and associated scripts that orchestrate building API documentation for all Spark languages.

Description

build_api_docs.rb is a Jekyll plugin that orchestrates documentation generation during the Jekyll build. It calls separate methods for each language: build_scala_and_java_docs (Scaladoc + Javadoc via SBT unidoc), build_python_docs (Sphinx + mkdocs), build_r_docs (pkgdown), build_sql_docs (custom Python scripts), and build_error_docs. Skip controls (SKIP_API, SKIP_SCALADOC, SKIP_PYTHONDOC, SKIP_RDOC, SKIP_SQLDOC, SKIP_ERRORDOC) allow selective generation.

The plugin integrates into the Jekyll build lifecycle, ensuring that API documentation is generated as part of the overall site build. Each documentation method handles:

  • Tool invocation: Calling the appropriate language-specific documentation generator with the correct configuration.
  • Output placement: Placing generated documentation in the correct subdirectory under docs/_site/api/.
  • Error handling: Detecting and reporting build failures for individual documentation components.

The SQL documentation generator (sql/create-docs.sh) is a separate shell script that invokes custom Python scripts to generate SQL function reference documentation from Spark's SQL function registry.

Usage

Invoked automatically by Jekyll during documentation builds. Use skip flags for faster development iterations when only one language's documentation needs to be regenerated.

Code Reference

Source Location

  • Repository: apache/spark
  • Files:
    • docs/_plugins/build_api_docs.rb (lines 1-238)
    • sql/create-docs.sh (lines 1-59)

Methods

Method Lines Description Tool Used
build_scala_and_java_docs 135-155 Generates Scala and Java API documentation SBT unidoc
build_python_docs 157-172 Generates PySpark API documentation Sphinx + mkdocs
build_r_docs 174-187 Generates SparkR API documentation R pkgdown
build_sql_docs 189-204 Generates SQL function reference Custom Python scripts
build_error_docs 206-215 Generates error code reference Custom scripts

Skip Environment Variables

Variable Effect
SKIP_API Skip all API documentation generation
SKIP_SCALADOC Skip Scala and Java API docs
SKIP_PYTHONDOC Skip Python API docs
SKIP_RDOC Skip R API docs
SKIP_SQLDOC Skip SQL reference docs
SKIP_ERRORDOC Skip error code docs

I/O Contract

Inputs

Name Type Required Description
Compiled Spark source build artifacts Yes Spark must be compiled before API docs can be generated
Documentation source docs/ directory Yes Jekyll site source, Markdown pages, configuration
Skip environment variables environment No Controls which documentation components to generate

Outputs

Name Type Description
Complete site docs/_site/ Full documentation site including all guides and references
Scala API docs api/scala/ Scaladoc-generated Scala API reference
Java API docs api/java/ Javadoc-generated Java API reference
Python API docs api/python/ Sphinx/mkdocs-generated PySpark API reference
R API docs api/R/ pkgdown-generated SparkR API reference
SQL docs api/sql/ SQL function reference documentation

Usage Examples

Full Documentation Build

# Build complete documentation site (production mode)
cd docs && PRODUCTION=1 bundle exec jekyll build

Skip Python Documentation

# Build everything except Python API docs
cd docs && SKIP_PYTHONDOC=1 bundle exec jekyll build

Build Only SQL Documentation

# Generate SQL function reference documentation
sql/create-docs.sh

Development Iteration (Skip All API Docs)

# Build site pages only, skipping all API documentation
cd docs && SKIP_API=1 bundle exec jekyll build

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment