Implementation:Allenai Open instruct Build Image And Launch
Appearance
| Knowledge Sources | |
|---|---|
| Domains | MLOps, DevOps, Distributed Systems |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Concrete tool for building a Docker image from the current code, uploading it to Beaker, and launching a training script provided by the Open Instruct repository.
Description
The build_image_and_launch.sh script automates the complete workflow of preparing and launching a training job on AI2's Beaker cluster. It performs the following steps:
- Git validation: Verifies the current directory is a git repo and has no uncommitted changes (staged, unstaged, or untracked). This ensures the Docker image matches the exact git commit.
- Commit identification: Extracts the short git hash and branch name. Sanitizes the branch name for use in Beaker image naming (replacing invalid characters).
- Image reuse check: Checks if a Beaker image already exists for the current commit hash by comparing the image description. If it matches, skips the Docker build entirely.
- Docker build: If no existing image matches, builds a Docker image for linux/amd64 with the git hash and branch as build arguments. Uses registry-based layer caching (
--cache-from/--cache-to) for speed. Falls back to cache-from only if cache push fails (e.g., due to permissions). - Beaker image upload: Renames any existing Beaker image with the same name, then creates a new image with the git commit in the description.
- Dependency sync: Installs
uvif not present and runsuv syncto ensure local dependencies are up to date. - Script execution: Runs the specified training script with the Beaker image name as the first argument and any additional CLI arguments forwarded.
Usage
Run this script from the repository root to build and launch training jobs on Beaker. The first argument is the path to the training script, and any additional arguments are passed through. The working tree must be clean (all changes committed).
Code Reference
Source Location
- Repository: Open Instruct
- File:
scripts/train/build_image_and_launch.sh - Lines: L1-74
Signature
#!/bin/bash
set -euo pipefail
# Usage:
./scripts/train/build_image_and_launch.sh <script_path> [additional_args...]
# Examples:
./scripts/train/build_image_and_launch.sh scripts/train/debug/single_gpu_on_beaker.sh
./scripts/train/build_image_and_launch.sh scripts/train/debug/tools/olmo_3_parser_multigpu.sh
./scripts/train/build_image_and_launch.sh scripts/train/debug/dpo/multi_node.sh
Import
# This is a shell script, invoked directly:
bash scripts/train/build_image_and_launch.sh <script_path>
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| $1 (script_path) | string (file path) | Yes | Path to the training launch script to execute (e.g., scripts/train/debug/single_gpu_on_beaker.sh).
|
| $@ (additional args) | string | No | Any additional arguments are forwarded to the training script. |
| DOCKER_CACHE_REPO (env) | string | No | Docker cache registry URL. Defaults to ghcr.io/allenai/open-instruct:buildcache.
|
Outputs
| Name | Type | Description |
|---|---|---|
| (side effects) | None | Builds and uploads a Docker image to Beaker (if needed), then executes the specified training script with the Beaker image name as its first argument. |
Prerequisites
| Requirement | Description |
|---|---|
| Clean git working tree | No uncommitted or untracked changes allowed. |
| docker | Docker with buildx support for building the training image. |
| beaker | Beaker CLI for image upload and management. |
| git | Git for commit hash and branch detection. |
| uv | Python dependency manager (auto-installed if missing). |
| Beaker authentication | Must be logged in to Beaker (beaker account whoami must succeed).
|
Workflow Diagram
[Clean Git Repo?] --No--> ERROR: Commit changes first
|
Yes
|
[Image exists for commit?] --Yes--> [Skip Docker build]
| |
No |
| |
[Docker buildx build] |
| |
[Upload to Beaker] |
| |
+------------------------------------+
|
[uv sync (install deps)]
|
[Run training script with image name]
Usage Examples
Basic Usage
# Launch a single-GPU GRPO debug run
./scripts/train/build_image_and_launch.sh scripts/train/debug/single_gpu_on_beaker.sh
# Launch a multi-GPU tool use experiment
./scripts/train/build_image_and_launch.sh scripts/train/debug/tools/olmo_3_parser_multigpu.sh
# Launch a multi-node DPO experiment
./scripts/train/build_image_and_launch.sh scripts/train/debug/dpo/multi_node.sh
# Launch GPU tests
./scripts/train/build_image_and_launch.sh scripts/train/debug/run_gpu_tests.sh
Dependencies
- docker -- container building and image management
- beaker -- AI2's cluster management and job scheduling system
- git -- version control for commit tracking
- uv -- fast Python dependency management
- jq -- JSON parsing for Beaker CLI output
Related Pages
Implements Principle
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment