Implementation:Sgl project Sglang PR Test Workflow
| Knowledge Sources | |
|---|---|
| Domains | CI/CD, Testing |
| Last Updated | 2026-02-10 00:00 GMT |
Overview
The main GitHub Actions CI workflow for testing pull requests, running scheduled nightly tests, and managing targeted stage reruns across multiple GPU architectures.
Description
pr-test.yml is a 1600+ line GitHub Actions workflow that orchestrates the entire test pipeline for the SGLang project. It implements a multi-stage pipeline with three sequential stages (A, B, C) that run across diverse GPU hardware including 5090, H100, H200, H20, B200, and GB200.
The workflow supports four trigger modes:
- pull_request: Runs change-detected tests on PRs targeting main
- schedule: Runs every 6 hours with full test parallelism (max_parallel=14)
- workflow_dispatch: Supports targeted stage reruns via /rerun-stage and configurable FlashInfer versions
- workflow_call: Allows other workflows to invoke it with custom refs
Key architectural features include:
- Change detection via dorny/paths-filter (for PRs) and GitHub API comparison (for workflow_dispatch with target_stage)
- sgl-kernel wheel builds on both x64 and ARM when kernel code changes are detected
- Sequential stage execution for PRs using wait jobs that poll the GitHub API
- Parallel execution for scheduled runs to enable easier individual stage retries
- Concurrency groups that cancel previous runs on the same branch
- Continue-on-error mode for scheduled/full test runs
Usage
This workflow is automatically triggered on pull requests to the main branch and via a cron schedule every 6 hours. It can also be manually dispatched to target specific stages or test specific FlashInfer versions. Contributors with CI permissions can use /rerun-stage and /tag-and-rerun-ci commands.
Code Reference
Source Location
- Repository: Sgl_project_Sglang
- File: .github/workflows/pr-test.yml
- Lines: 1-1617
Signature
name: PR Test
on:
schedule:
- cron: '0 */6 * * *'
pull_request:
branches: [main]
workflow_dispatch:
inputs:
version:
description: "FlashInfer version"
type: choice
default: "release"
target_stage:
description: "Specific stage to run"
type: string
force_continue_on_error:
type: boolean
pr_head_sha:
description: "PR head SHA for /rerun-stage on fork PRs"
type: string
test_parallel_dispatch:
type: boolean
workflow_call:
inputs:
ref:
type: string
run_all_tests:
type: boolean
Import
N/A -- This is a GitHub Actions YAML workflow definition.
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| version | choice (release/nightly) | No | FlashInfer version to test against |
| target_stage | string | No | Specific stage name to run (e.g., stage-b-test-large-1-gpu) |
| force_continue_on_error | boolean | No | Force continue-on-error behavior for all stages |
| pr_head_sha | string | No | PR head SHA for /rerun-stage on fork PRs |
| test_parallel_dispatch | boolean | No | Simulate scheduled parallel dispatch behavior |
| ref | string | No | Git ref for workflow_call invocations |
| run_all_tests | boolean | No | Run all tests regardless of change detection |
Outputs
| Name | Type | Description |
|---|---|---|
| main_package | boolean | Whether main package changes were detected |
| sgl_kernel | boolean | Whether sgl-kernel changes were detected |
| jit_kernel | boolean | Whether JIT kernel changes were detected |
| multimodal_gen | boolean | Whether multimodal gen changes were detected |
| max_parallel | integer | Maximum parallel job count (3 for PRs, 14 for scheduled) |
| b200_runner | string | B200 runner tag based on kernel changes |
| pr-test-finish result | success/failure | Overall CI pass/fail status |
Usage Examples
Change Detection Filter Paths
filters: |
main_package:
- "python/sglang/!(multimodal_gen)/**"
- "python/pyproject.toml"
- "scripts/ci/cuda/*"
- "test/**"
- ".github/workflows/pr-test.yml"
sgl_kernel:
- "sgl-kernel/**"
jit_kernel:
- "python/sglang/jit_kernel/**"
multimodal_gen:
- "python/sglang/multimodal_gen/**"
Stage B Large 1-GPU Test Job
stage-b-test-large-1-gpu:
needs: [check-changes, call-gate, wait-for-stage-a, sgl-kernel-build-wheels]
runs-on: 1-gpu-runner
timeout-minutes: 240
strategy:
fail-fast: false
max-parallel: ${{ fromJson(needs.check-changes.outputs.max_parallel) }}
matrix:
partition: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]