Implementation:Apache Spark Run Tests
| Field | Value |
|---|---|
| Source Repository | Apache Spark |
| Domains | Testing, CI_CD |
| Last Updated | 2026-02-08 14:00 GMT |
Overview
Central CI/CD test orchestrator script for running the full Apache Spark test suite across all languages.
Description
The dev/run-tests.py script is the main entry point for Spark's CI pipeline. It determines which modules to test based on git changes, runs style checks (Apache RAT, Scala, Java, Python, R), builds Spark, performs binary compatibility checks (MiMa), and executes test suites for Scala, Python, and R.
The script orchestrates the following stages in order:
- License checking via Apache RAT to ensure all files carry proper license headers
- Style checking for Scala, Java, Python, and R source files
- Building Spark using Maven or SBT depending on configuration
- Binary compatibility checking via MiMa (Migration Manager)
- Module-based test execution for Scala/Java tests via SBT or Maven
- PySpark test execution by delegating to python/run-tests.py
- SparkR test execution by delegating to R/run-tests.sh
The script uses the Module class defined in dev/sparktestsupport/modules.py to determine which modules are affected by a given set of file changes. It then uses topological sort from dev/sparktestsupport/toposort.py to order the modules correctly before executing their associated test commands.
Usage
Use this script to run the CI test suite locally or in automated CI environments like GitHub Actions. It is the canonical way to validate Spark changes before submitting pull requests.
Code Reference
| Attribute | Details |
|---|---|
| Source | Repository apache/spark, File dev/run-tests.py, Lines 465-656 (main function) |
| Supporting Files | dev/sparktestsupport/modules.py (Module class L27-48), dev/sparktestsupport/utils.py (determine_modules_for_files L32-167), dev/sparktestsupport/toposort.py (toposort_flatten L41-84) |
| Signature | python3 dev/run-tests.py [--modules=<list>] [--parallelism=N] [--excluded-tags=<tags>] [--included-tags=<tags>]
|
| Import | N/A (standalone script) |
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
--modules |
str | No | Comma-separated module names to test |
--parallelism |
int | No | Number of parallel test processes (default 4) |
--excluded-tags |
str | No | Test tags to exclude from execution |
--included-tags |
str | No | Test tags to include in execution |
| Compiled Spark source | implicit | Yes | Spark must be built first (the script handles this) |
Outputs
| Output | Description |
|---|---|
| Test results | Pass/fail status per module printed to stdout |
| Exit code | 0 on success; error codes defined in sparktestsupport/__init__.py |
Usage Examples
Run all tests:
python3 dev/run-tests.py
Run specific modules:
python3 dev/run-tests.py --modules=core,sql
Run with increased parallelism:
python3 dev/run-tests.py --parallelism=8
Run with tag filtering:
python3 dev/run-tests.py --excluded-tags=org.apache.spark.tags.SlowTest