Principle:Allenai Open instruct CI CD Integration Testing
| Knowledge Sources | |
|---|---|
| Domains | CI_CD, Infrastructure |
| Last Updated | 2026-02-07 02:00 GMT |
Overview
Principle of validating code changes through automated GPU experiment execution before promotion to production, ensuring training code correctness via real hardware integration tests.
Description
CI/CD Integration Testing in the context of ML training systems goes beyond standard unit testing. Because training code interacts with GPUs, distributed computing frameworks (Ray, DeepSpeed), and external services (Beaker, Docker), changes must be validated by running actual training experiments on GPU hardware. This principle ensures that code entering the main branch has been verified to produce successful training runs across all affected training algorithms (GRPO, DPO, SFT). The approach uses change detection to selectively run only the experiments affected by code modifications, balancing thoroughness with resource efficiency.
Usage
Apply this principle when building CI/CD pipelines for ML training codebases that require GPU validation. It is the appropriate choice when unit tests alone cannot verify correctness due to hardware-dependent behavior, distributed training interactions, or Docker container compatibility.
Theoretical Basis
The core logic follows a detect-build-test-promote pipeline:
Pseudo-code Logic:
# Abstract algorithm description
changed_files = detect_changes(pull_request)
experiments_to_run = map_changes_to_experiments(changed_files)
docker_image = build_and_push_image(codebase)
results = run_experiments_on_gpu_cluster(experiments_to_run, docker_image)
if all_succeeded(results):
promote_image(docker_image, "production_tag")
else:
notify_team(results)
block_merge()
The selective experiment triggering is important: running all experiments for every change would be wasteful, but running none would miss regressions. The mapping from file changes to affected experiments provides an efficient middle ground.