Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ray project Ray Repro CI Tool

From Leeroopedia
Knowledge Sources
Domains CI, Debugging, AWS, Infrastructure
Last Updated 2026-02-13 16:00 GMT

Overview

This Python script creates an AWS EC2 instance to reproduce Buildkite CI build failures by replicating the exact runner environment, Docker container, and build commands from a failed build.

Description

The ci/repro-ci.py tool takes a Buildkite build URL as input and automates the process of reproducing CI failures. It fetches environment variables and plugin configurations via the Buildkite API, provisions an EC2 instance matching the original runner (same AMI and instance type with a 500 GB volume), installs Docker, logs into ECR, pulls the same container image via Buildkite plugin hooks, and optionally executes the build commands with regex-based filtering to skip specific steps (e.g., bazel build). The user is then attached to an interactive SSH session inside the Docker container.

Usage

Developers use this tool when a CI build fails on Buildkite but cannot be reproduced locally. It is invoked from the command line with a Buildkite job URL. The tool requires AWS credentials, a Buildkite API token (auto-fetched from AWS Secrets Manager if not set), and an SSH key pair (~/.ssh/buildkite-repro-env.pem).

Code Reference

Source Location

  • Repository: Ray
  • File: ci/repro-ci.py
  • Lines: 1-683

Signature

"""Create an AWS instance to reproduce Buildkite CI builds."""

class ReproSession:
    plugin_default_env = {
        "docker": {"BUILDKITE_PLUGIN_DOCKER_MOUNT_BUILDKITE_AGENT": False}
    }

    def __init__(
        self,
        buildkite_token: str,
        instance_name: Optional[str] = None,
        logger: Optional[logging.Logger] = None,
    ):
        ...

    def set_session(self, session_url: str): ...
    def fetch_env_variables(self, overwrite=None): ...
    def aws_start_instance(self): ...
    def aws_wait_for_instance(self): ...
    def prepare_instance(self): ...
    def run_buildkite_command(self, command_filter=None): ...
    def attach_to_container(self): ...

@click.command()
@click.argument("session_url", required=False)
@click.option("-n", "--instance-name", default=None)
@click.option("-c", "--commands", is_flag=True, default=False)
@click.option("-f", "--filters", multiple=True, default=[])
def main(session_url, instance_name, commands, filters): ...

Import

import base64, json, logging, os, random, re, shlex, subprocess, threading, time
from numbers import Number
from typing import Any, Callable, Dict, List, Optional
import boto3
import click
import paramiko
import yaml
from pybuildkite.buildkite import Buildkite

I/O Contract

Inputs

Name Type Required Description
session_url string (CLI arg) yes Buildkite job URL in format https://buildkite.com/{org}/{pipeline}/builds/{id}#{job_id}
-n / --instance-name string (CLI option) no Custom name for the EC2 instance; defaults to repro_ci_{build_id}_{job_id[:8]}
-c / --commands flag (CLI option) no If set, automatically execute the build commands after setup
-f / --filters string list (CLI option) no Regex patterns for commands to skip (requires -c flag)
BUILDKITE_TOKEN env var yes Buildkite API token; auto-fetched from AWS Secrets Manager if not set
~/.ssh/buildkite-repro-env.pem file yes SSH private key for connecting to the EC2 instance

Outputs

Name Type Description
EC2 instance AWS resource Running EC2 instance with Docker and the CI container
Interactive SSH session terminal Attached shell session inside the Docker container
Instance termination command stdout Printed command to terminate the instance when done

Usage Examples

Basic usage to reproduce a failing CI build:

# Reproduce a specific Buildkite job
python ci/repro-ci.py "https://buildkite.com/ray-project/ray-builders-pr/builds/19635#55a0d71a-831e-4f68-b668-2b10c6f65ee6"

# With a custom instance name
python ci/repro-ci.py -n my-debug-instance "https://buildkite.com/ray-project/..."

# Auto-execute commands, skipping bazel build steps
python ci/repro-ci.py -c -f "bazel build" "https://buildkite.com/ray-project/..."

# Auto-execute with multiple filters
python ci/repro-ci.py -c -f "bazel build" -f "pip install" "https://buildkite.com/ray-project/..."

The tool follows this workflow:

  1. Parses the Buildkite URL to extract org, pipeline, build ID, and job ID
  2. Fetches environment variables from the Buildkite API
  3. Creates or reuses an EC2 instance matching the original runner's AMI and instance type
  4. Installs Docker and logs into ECR on the instance
  5. Installs Buildkite plugins (dind, docker) and runs their hooks to pull the Docker image
  6. Optionally runs the build commands with filtering
  7. Attaches the user to an interactive session inside the container
  8. Prompts to terminate the instance on exit

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment