Principle:Rapidsai Cuml Build And CI Infrastructure
| Knowledge Sources | |
|---|---|
| Domains | Software_Engineering, Build_Systems, Continuous_Integration |
| Last Updated | 2026-02-08 12:00 GMT |
Overview
Build and CI infrastructure encompasses the build system configuration, dependency management, continuous integration workflows, and code quality enforcement tools that ensure reproducible compilation, testing, and release of a GPU-accelerated machine learning library.
Description
A GPU-accelerated C++/Python machine learning library requires a sophisticated build and CI infrastructure to manage the complexity of multi-language compilation, CUDA kernel compilation, dependency resolution, and cross-platform packaging.
Build System (build.sh + CMake): The build process is orchestrated by a top-level shell script that coordinates multiple build targets: the C++ library (libcuml), the Python package (cuml), multi-GPU tests (cpp-mgtests), primitive tests (prims), benchmarks (bench), and documentation (cppdocs, pydocs). The script supports numerous flags for customization: debug mode, GPU architecture selection (single vs. all architectures), NVTX profiling instrumentation, code coverage with Cython line tracing, ccache for compilation caching, and configure-only mode for IDE integration. CMake is used as the underlying build system for C++ compilation, handling CUDA toolkit detection, library linking, and target configuration.
Dependency Management (dependencies.yaml): Dependencies are declared in a centralized YAML file that serves as the single source of truth for the RAPIDS dependency file generator. Dependencies are organized by function: common build tools, CUDA toolkit versions (supporting multiple CUDA versions simultaneously), Python build and runtime dependencies, and test dependencies. The file generates conda environment specifications for different build configurations including development, testing, documentation, and clang-tidy analysis.
PR and CI Workflows: The CI pipeline includes scripts for building C++ and Python components, running C++ tests (ctests), running Python tests (single-GPU, multi-GPU/Dask, integration, scikit-learn compatibility), notebook validation, wheel building and validation, and documentation generation. Each CI stage is encapsulated in a dedicated shell script under the ci/ directory.
Code Quality Tools:
- Include Checker: A Python script that enforces consistent #include syntax across C++ source files. It validates include directive formatting, checks for proper use of angle brackets vs. quotes, and ensures compliance with the project's include style conventions.
- Clang-Tidy Runner: Runs clang-tidy static analysis on the C++ codebase using a dedicated conda environment with the appropriate clang-tidy version. The configuration is read from the pyproject.toml file, and the analysis is run against the cmake compile commands database.
Usage
The build and CI infrastructure is used when:
- Building the library from source for development, testing, or deployment.
- Adding new C++ or Python components that must integrate with the existing build targets.
- Managing dependency versions across CUDA toolkit versions and CPU architectures.
- Submitting pull requests that must pass automated quality checks (style, static analysis, tests).
- Creating release artifacts (conda packages, Python wheels) for distribution.
- Debugging build failures or test regressions in the CI pipeline.
Theoretical Basis
Build Target Dependency Graph:
libcuml (C++ library)
|-> cuml (Python package, depends on libcuml)
|-> cpp-mgtests (multi-GPU tests, adds MPI dependency)
|-> bench (C++ benchmarks)
|-> prims (primitive library tests)
Documentation:
cppdocs -> Doxygen on C++ headers
pydocs -> Sphinx on Python docstrings
CI Pipeline Stages:
1. Style/Lint checks (clang-tidy, include_checker, black_lists)
2. C++ build (build_cpp.sh)
3. Python build (build_python.sh)
4. C++ tests (test_cpp.sh -> run_ctests.sh)
5. Python single-GPU tests (test_python_singlegpu.sh)
6. Python Dask/multi-GPU tests (test_python_dask.sh)
7. Integration tests (test_python_integration.sh)
8. Wheel build and validation (build_wheel.sh, validate_wheel.sh)
9. Documentation build (build_docs.sh)
Dependency Resolution:
dependencies.yaml
-> rapids-dependency-file-generator
-> conda environment YAML (per CUDA version, per architecture)
-> conda/mamba environment creation