Implementation:Ray project Ray Repro CI Tool
| Knowledge Sources | |
|---|---|
| Domains | CI, Debugging, AWS, Infrastructure |
| Last Updated | 2026-02-13 16:00 GMT |
Overview
This Python script creates an AWS EC2 instance to reproduce Buildkite CI build failures by replicating the exact runner environment, Docker container, and build commands from a failed build.
Description
The ci/repro-ci.py tool takes a Buildkite build URL as input and automates the process of reproducing CI failures. It fetches environment variables and plugin configurations via the Buildkite API, provisions an EC2 instance matching the original runner (same AMI and instance type with a 500 GB volume), installs Docker, logs into ECR, pulls the same container image via Buildkite plugin hooks, and optionally executes the build commands with regex-based filtering to skip specific steps (e.g., bazel build). The user is then attached to an interactive SSH session inside the Docker container.
Usage
Developers use this tool when a CI build fails on Buildkite but cannot be reproduced locally. It is invoked from the command line with a Buildkite job URL. The tool requires AWS credentials, a Buildkite API token (auto-fetched from AWS Secrets Manager if not set), and an SSH key pair (~/.ssh/buildkite-repro-env.pem).
Code Reference
Source Location
- Repository: Ray
- File:
ci/repro-ci.py - Lines: 1-683
Signature
"""Create an AWS instance to reproduce Buildkite CI builds."""
class ReproSession:
plugin_default_env = {
"docker": {"BUILDKITE_PLUGIN_DOCKER_MOUNT_BUILDKITE_AGENT": False}
}
def __init__(
self,
buildkite_token: str,
instance_name: Optional[str] = None,
logger: Optional[logging.Logger] = None,
):
...
def set_session(self, session_url: str): ...
def fetch_env_variables(self, overwrite=None): ...
def aws_start_instance(self): ...
def aws_wait_for_instance(self): ...
def prepare_instance(self): ...
def run_buildkite_command(self, command_filter=None): ...
def attach_to_container(self): ...
@click.command()
@click.argument("session_url", required=False)
@click.option("-n", "--instance-name", default=None)
@click.option("-c", "--commands", is_flag=True, default=False)
@click.option("-f", "--filters", multiple=True, default=[])
def main(session_url, instance_name, commands, filters): ...
Import
import base64, json, logging, os, random, re, shlex, subprocess, threading, time
from numbers import Number
from typing import Any, Callable, Dict, List, Optional
import boto3
import click
import paramiko
import yaml
from pybuildkite.buildkite import Buildkite
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
session_url |
string (CLI arg) | yes | Buildkite job URL in format https://buildkite.com/{org}/{pipeline}/builds/{id}#{job_id}
|
-n / --instance-name |
string (CLI option) | no | Custom name for the EC2 instance; defaults to repro_ci_{build_id}_{job_id[:8]}
|
-c / --commands |
flag (CLI option) | no | If set, automatically execute the build commands after setup |
-f / --filters |
string list (CLI option) | no | Regex patterns for commands to skip (requires -c flag)
|
BUILDKITE_TOKEN |
env var | yes | Buildkite API token; auto-fetched from AWS Secrets Manager if not set |
~/.ssh/buildkite-repro-env.pem |
file | yes | SSH private key for connecting to the EC2 instance |
Outputs
| Name | Type | Description |
|---|---|---|
| EC2 instance | AWS resource | Running EC2 instance with Docker and the CI container |
| Interactive SSH session | terminal | Attached shell session inside the Docker container |
| Instance termination command | stdout | Printed command to terminate the instance when done |
Usage Examples
Basic usage to reproduce a failing CI build:
# Reproduce a specific Buildkite job
python ci/repro-ci.py "https://buildkite.com/ray-project/ray-builders-pr/builds/19635#55a0d71a-831e-4f68-b668-2b10c6f65ee6"
# With a custom instance name
python ci/repro-ci.py -n my-debug-instance "https://buildkite.com/ray-project/..."
# Auto-execute commands, skipping bazel build steps
python ci/repro-ci.py -c -f "bazel build" "https://buildkite.com/ray-project/..."
# Auto-execute with multiple filters
python ci/repro-ci.py -c -f "bazel build" -f "pip install" "https://buildkite.com/ray-project/..."
The tool follows this workflow:
- Parses the Buildkite URL to extract org, pipeline, build ID, and job ID
- Fetches environment variables from the Buildkite API
- Creates or reuses an EC2 instance matching the original runner's AMI and instance type
- Installs Docker and logs into ECR on the instance
- Installs Buildkite plugins (dind, docker) and runs their hooks to pull the Docker image
- Optionally runs the build commands with filtering
- Attaches the user to an interactive session inside the container
- Prompts to terminate the instance on exit