Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ggml org Ggml CI Build Testing

From Leeroopedia
Revision as of 17:43, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/Ggml_org_Ggml_CI_Build_Testing.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains CI_CD, Testing
Last Updated 2026-02-10

Overview

CI Build Testing is the continuous integration pipeline that validates multi-backend builds, runs test suites, and executes example model inference across hardware configurations.

Description

GGML supports numerous hardware backends -- CPU, CUDA, ROCm/HIP, SYCL, Vulkan, Metal, WebGPU, and MUSA -- and each combination of backend, build type, and platform can introduce regressions. CI Build Testing codifies a reproducible, automated pipeline (implemented as a Bash script at ci/run.sh) that:

  1. Configures the build by setting CMake flags based on environment variables (GG_BUILD_CUDA, GG_BUILD_SYCL, GG_BUILD_VULKAN, GG_BUILD_ROCM, GG_BUILD_METAL, GG_BUILD_WEBGPU, GG_BUILD_MUSA). Each flag activates the corresponding GGML backend and applies any required vendor-specific settings (e.g., querying nvidia-smi for CUDA architecture, requiring ONEAPI_ROOT for SYCL).
  2. Builds in both Debug and Release modes using CMake and Make, exercising compiler warnings and assertions (Debug) as well as optimization correctness (Release).
  3. Runs the CTest suite to validate unit tests and backend operation tests. In Debug mode, expensive tests like test-opt and test-backend-ops are excluded; in Release mode, the full suite runs unless a low-performance flag is set.
  4. Executes example model inference for GPT-2, SAM (Segment Anything Model), and YOLO (object detection). These integration tests download pre-trained weights, run inference with fixed seeds, and validate output against expected patterns using grep assertions. This catches regressions in the full stack from model loading through graph construction to backend computation.
  5. Produces structured logs and exit codes for each stage, written to an output directory with a summary README in Markdown format.

The pipeline also establishes a Python virtual environment and installs project dependencies from requirements.txt, ensuring that Python-dependent conversion scripts and model preparation steps work correctly.

Usage

Apply this principle when contributing changes to GGML that affect the build system, backend implementations, or core tensor operations. Run bash ./ci/run.sh ./tmp/results ./tmp/mnt locally before submitting changes to catch build failures and test regressions. Set the appropriate GG_BUILD_* environment variables to test specific backend configurations. Use GG_BUILD_LOW_PERF=1 on resource-constrained machines to skip long-running optimization tests.

Theoretical Basis

The CI pipeline embodies several core principles of continuous integration for multi-platform native software:

  1. Backend-Conditional Compilation -- Each hardware backend is gated behind a compile-time flag that maps to a CMake option. The CI script mirrors this structure by translating environment variables to CMake arguments. This ensures that every supported backend combination is testable in isolation and that enabling one backend does not break another.
  2. Two-Tier Testing Strategy -- The pipeline distinguishes between unit tests (CTest) and integration tests (model inference). Unit tests validate individual operations and APIs in isolation; integration tests validate the end-to-end path from model files to inference output. Both tiers are necessary because unit-test passes do not guarantee correct behavior when components are composed, and integration tests alone cannot pinpoint the source of failures.
  3. Deterministic Reproducibility -- Model inference tests use fixed random seeds (-s 1234) and known input prompts, so their outputs are deterministic across runs. This allows output validation with simple pattern matching rather than statistical thresholds.
  4. Output Validation by Assertion -- Integration tests validate results using grep against expected output patterns (e.g., checking that YOLO detects a "dog" with 55-59% confidence). This is a form of golden-output testing where known-good outputs serve as regression baselines.
  5. Graceful Degradation -- The pipeline supports a GG_BUILD_LOW_PERF flag to skip expensive tests on slower machines, and individual test stages are sequenced with early exit on failure (test $ret -eq 0 && gg_run ...). This prevents wasting compute on downstream stages when an earlier stage has already failed.

Related Pages

Implemented By

See Also

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment