Environment:Allenai Open instruct Docker Container
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Containerization |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
Docker container environment based on nvidia/cuda:12.9.0-devel-ubuntu22.04 for reproducible training on Beaker.
Description
The Dockerfile defines the complete training container: CUDA 12.9 on Ubuntu 22.04, with NVIDIA DOCA OFED networking drivers (for InfiniBand), the Beaker CLI, uv package manager, and all Python dependencies. The container is built and launched via `scripts/train/build_image_and_launch.sh`, which requires a clean git state (all changes committed).
Usage
Use this environment for all Beaker-based training and evaluation jobs. The `build_image_and_launch.sh` script builds the Docker image, pushes it to the container registry, and submits it as a Beaker experiment. Local development can use the Python environment directly without Docker.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Container Runtime | Docker with BuildKit (buildx) | For building multi-stage images |
| Registry | ghcr.io/allenai/open-instruct | Container registry for caching and distribution |
| Git | Clean working tree required | build_image_and_launch.sh checks for uncommitted changes |
Dependencies
Base Image
- `nvidia/cuda:12.9.0-devel-ubuntu22.04`
System Packages (in container)
- NVIDIA DOCA OFED (version 2.10.0) with Mellanox networking
- Mellanox Firmware Tools (MFT version 4.31.0-149)
- Beaker CLI (version 1.5.235)
- uv package manager
- git, curl, wget, vim
Build Args
- `GIT_COMMIT`: Current git commit hash (injected at build time)
- `GIT_BRANCH`: Current git branch name (injected at build time)
Credentials
No credentials required for building. At runtime, Beaker injects secrets:
- `HF_TOKEN`: HuggingFace API token
- `WANDB_API_KEY`: Weights & Biases API key
- `BEAKER_TOKEN`: Beaker authentication token
Quick Install
# Build and launch on Beaker (requires clean git state)
./scripts/train/build_image_and_launch.sh scripts/train/debug/single_gpu_on_beaker.sh
Code Evidence
Base image from `Dockerfile:1`:
FROM nvidia/cuda:12.9.0-devel-ubuntu22.04
OFED networking drivers from `Dockerfile:26-38`:
ENV MFT_VER=4.31.0-149
ENV DOFED_VER=2.10.0
Clean git check from `scripts/train/build_image_and_launch.sh:19-24`:
git_hash=$(git rev-parse --short HEAD)
git_branch=$(git rev-parse --abbrev-ref HEAD)
sanitized_branch=$(echo "$git_branch" | sed 's/[^a-zA-Z0-9._-]/-/g' | tr '[:upper:]' '[:lower:]' | sed 's/^-//')
image_name=open-instruct-integration-test-${sanitized_branch}
Docker cache configuration from `scripts/train/build_image_and_launch.sh:34,40-41`:
CACHE_REPO="${DOCKER_CACHE_REPO:-ghcr.io/allenai/open-instruct:buildcache}"
--cache-from "type=registry,ref=$CACHE_REPO"
--cache-to "type=registry,ref=$CACHE_REPO,mode=max"
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `ERROR: uncommitted changes detected` | Git working tree is dirty | Commit all changes before running build_image_and_launch.sh |
| Docker build cache miss | First build on a new branch | Build takes longer; subsequent builds use registry cache |
| Beaker experiment fails to start | Missing secrets in Beaker workspace | Ensure HF_TOKEN, WANDB_API_KEY, BEAKER_TOKEN are configured |
Compatibility Notes
- Local development: Docker is only needed for Beaker experiments. Local training uses the Python environment directly.
- Dirty tree builds: Use `build_image_and_launch_dirty.sh` for testing with uncommitted changes (not recommended for production).
- Cache strategy: Uses Docker BuildKit registry-based caching for faster rebuilds.