Implementation:Sgl project Sglang PR Test Workflow

Knowledge Sources	Sgl_project_Sglang
Domains	CI/CD, Testing
Last Updated	2026-02-10 00:00 GMT

Overview

The main GitHub Actions CI workflow for testing pull requests, running scheduled nightly tests, and managing targeted stage reruns across multiple GPU architectures.

Description

pr-test.yml is a 1600+ line GitHub Actions workflow that orchestrates the entire test pipeline for the SGLang project. It implements a multi-stage pipeline with three sequential stages (A, B, C) that run across diverse GPU hardware including 5090, H100, H200, H20, B200, and GB200.

The workflow supports four trigger modes:

pull_request: Runs change-detected tests on PRs targeting main
schedule: Runs every 6 hours with full test parallelism (max_parallel=14)
workflow_dispatch: Supports targeted stage reruns via /rerun-stage and configurable FlashInfer versions
workflow_call: Allows other workflows to invoke it with custom refs

Key architectural features include:

Change detection via dorny/paths-filter (for PRs) and GitHub API comparison (for workflow_dispatch with target_stage)
sgl-kernel wheel builds on both x64 and ARM when kernel code changes are detected
Sequential stage execution for PRs using wait jobs that poll the GitHub API
Parallel execution for scheduled runs to enable easier individual stage retries
Concurrency groups that cancel previous runs on the same branch
Continue-on-error mode for scheduled/full test runs

Usage

This workflow is automatically triggered on pull requests to the main branch and via a cron schedule every 6 hours. It can also be manually dispatched to target specific stages or test specific FlashInfer versions. Contributors with CI permissions can use /rerun-stage and /tag-and-rerun-ci commands.

Code Reference

Source Location

Repository: Sgl_project_Sglang
File: .github/workflows/pr-test.yml
Lines: 1-1617

Signature

name: PR Test

on:
  schedule:
    - cron: '0 */6 * * *'
  pull_request:
    branches: [main]
  workflow_dispatch:
    inputs:
      version:
        description: "FlashInfer version"
        type: choice
        default: "release"
      target_stage:
        description: "Specific stage to run"
        type: string
      force_continue_on_error:
        type: boolean
      pr_head_sha:
        description: "PR head SHA for /rerun-stage on fork PRs"
        type: string
      test_parallel_dispatch:
        type: boolean
  workflow_call:
    inputs:
      ref:
        type: string
      run_all_tests:
        type: boolean

Import

N/A -- This is a GitHub Actions YAML workflow definition.

I/O Contract

Inputs

Name	Type	Required	Description
version	choice (release/nightly)	No	FlashInfer version to test against
target_stage	string	No	Specific stage name to run (e.g., stage-b-test-large-1-gpu)
force_continue_on_error	boolean	No	Force continue-on-error behavior for all stages
pr_head_sha	string	No	PR head SHA for /rerun-stage on fork PRs
test_parallel_dispatch	boolean	No	Simulate scheduled parallel dispatch behavior
ref	string	No	Git ref for workflow_call invocations
run_all_tests	boolean	No	Run all tests regardless of change detection

Outputs

Name	Type	Description
main_package	boolean	Whether main package changes were detected
sgl_kernel	boolean	Whether sgl-kernel changes were detected
jit_kernel	boolean	Whether JIT kernel changes were detected
multimodal_gen	boolean	Whether multimodal gen changes were detected
max_parallel	integer	Maximum parallel job count (3 for PRs, 14 for scheduled)
b200_runner	string	B200 runner tag based on kernel changes
pr-test-finish result	success/failure	Overall CI pass/fail status

Usage Examples

Change Detection Filter Paths

filters: |
  main_package:
    - "python/sglang/!(multimodal_gen)/**"
    - "python/pyproject.toml"
    - "scripts/ci/cuda/*"
    - "test/**"
    - ".github/workflows/pr-test.yml"
  sgl_kernel:
    - "sgl-kernel/**"
  jit_kernel:
    - "python/sglang/jit_kernel/**"
  multimodal_gen:
    - "python/sglang/multimodal_gen/**"

Stage B Large 1-GPU Test Job

stage-b-test-large-1-gpu:
  needs: [check-changes, call-gate, wait-for-stage-a, sgl-kernel-build-wheels]
  runs-on: 1-gpu-runner
  timeout-minutes: 240
  strategy:
    fail-fast: false
    max-parallel: ${{ fromJson(needs.check-changes.outputs.max_parallel) }}
    matrix:
      partition: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment