Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Allenai Open instruct CI CD Integration Testing

From Leeroopedia


Knowledge Sources
Domains CI_CD, Infrastructure
Last Updated 2026-02-07 02:00 GMT

Overview

Principle of validating code changes through automated GPU experiment execution before promotion to production, ensuring training code correctness via real hardware integration tests.

Description

CI/CD Integration Testing in the context of ML training systems goes beyond standard unit testing. Because training code interacts with GPUs, distributed computing frameworks (Ray, DeepSpeed), and external services (Beaker, Docker), changes must be validated by running actual training experiments on GPU hardware. This principle ensures that code entering the main branch has been verified to produce successful training runs across all affected training algorithms (GRPO, DPO, SFT). The approach uses change detection to selectively run only the experiments affected by code modifications, balancing thoroughness with resource efficiency.

Usage

Apply this principle when building CI/CD pipelines for ML training codebases that require GPU validation. It is the appropriate choice when unit tests alone cannot verify correctness due to hardware-dependent behavior, distributed training interactions, or Docker container compatibility.

Theoretical Basis

The core logic follows a detect-build-test-promote pipeline:

Pseudo-code Logic:

# Abstract algorithm description
changed_files = detect_changes(pull_request)
experiments_to_run = map_changes_to_experiments(changed_files)
docker_image = build_and_push_image(codebase)
results = run_experiments_on_gpu_cluster(experiments_to_run, docker_image)
if all_succeeded(results):
    promote_image(docker_image, "production_tag")
else:
    notify_team(results)
    block_merge()

The selective experiment triggering is important: running all experiments for every change would be wasteful, but running none would miss regressions. The mapping from file changes to affected experiments provides an efficient middle ground.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment