Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Apache Spark Run Tests

From Leeroopedia


Field Value
Source Repository Apache Spark
Domains Testing, CI_CD
Last Updated 2026-02-08 14:00 GMT

Overview

Central CI/CD test orchestrator script for running the full Apache Spark test suite across all languages.

Description

The dev/run-tests.py script is the main entry point for Spark's CI pipeline. It determines which modules to test based on git changes, runs style checks (Apache RAT, Scala, Java, Python, R), builds Spark, performs binary compatibility checks (MiMa), and executes test suites for Scala, Python, and R.

The script orchestrates the following stages in order:

  • License checking via Apache RAT to ensure all files carry proper license headers
  • Style checking for Scala, Java, Python, and R source files
  • Building Spark using Maven or SBT depending on configuration
  • Binary compatibility checking via MiMa (Migration Manager)
  • Module-based test execution for Scala/Java tests via SBT or Maven
  • PySpark test execution by delegating to python/run-tests.py
  • SparkR test execution by delegating to R/run-tests.sh

The script uses the Module class defined in dev/sparktestsupport/modules.py to determine which modules are affected by a given set of file changes. It then uses topological sort from dev/sparktestsupport/toposort.py to order the modules correctly before executing their associated test commands.

Usage

Use this script to run the CI test suite locally or in automated CI environments like GitHub Actions. It is the canonical way to validate Spark changes before submitting pull requests.

Code Reference

Attribute Details
Source Repository apache/spark, File dev/run-tests.py, Lines 465-656 (main function)
Supporting Files dev/sparktestsupport/modules.py (Module class L27-48), dev/sparktestsupport/utils.py (determine_modules_for_files L32-167), dev/sparktestsupport/toposort.py (toposort_flatten L41-84)
Signature python3 dev/run-tests.py [--modules=<list>] [--parallelism=N] [--excluded-tags=<tags>] [--included-tags=<tags>]
Import N/A (standalone script)

I/O Contract

Inputs

Parameter Type Required Description
--modules str No Comma-separated module names to test
--parallelism int No Number of parallel test processes (default 4)
--excluded-tags str No Test tags to exclude from execution
--included-tags str No Test tags to include in execution
Compiled Spark source implicit Yes Spark must be built first (the script handles this)

Outputs

Output Description
Test results Pass/fail status per module printed to stdout
Exit code 0 on success; error codes defined in sparktestsupport/__init__.py

Usage Examples

Run all tests:

python3 dev/run-tests.py

Run specific modules:

python3 dev/run-tests.py --modules=core,sql

Run with increased parallelism:

python3 dev/run-tests.py --parallelism=8

Run with tag filtering:

python3 dev/run-tests.py --excluded-tags=org.apache.spark.tags.SlowTest

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment