Environment:Allenai Open instruct Docker Container

Knowledge Sources	Open Instruct NVIDIA CUDA Docker
Domains	Infrastructure, Containerization
Last Updated	2026-02-07 00:00 GMT

Overview

Docker container environment based on nvidia/cuda:12.9.0-devel-ubuntu22.04 for reproducible training on Beaker.

Description

The Dockerfile defines the complete training container: CUDA 12.9 on Ubuntu 22.04, with NVIDIA DOCA OFED networking drivers (for InfiniBand), the Beaker CLI, uv package manager, and all Python dependencies. The container is built and launched via `scripts/train/build_image_and_launch.sh`, which requires a clean git state (all changes committed).

Usage

Use this environment for all Beaker-based training and evaluation jobs. The `build_image_and_launch.sh` script builds the Docker image, pushes it to the container registry, and submits it as a Beaker experiment. Local development can use the Python environment directly without Docker.

System Requirements

Category	Requirement	Notes
Container Runtime	Docker with BuildKit (buildx)	For building multi-stage images
Registry	ghcr.io/allenai/open-instruct	Container registry for caching and distribution
Git	Clean working tree required	build_image_and_launch.sh checks for uncommitted changes

Dependencies

Base Image

`nvidia/cuda:12.9.0-devel-ubuntu22.04`

System Packages (in container)

NVIDIA DOCA OFED (version 2.10.0) with Mellanox networking
Mellanox Firmware Tools (MFT version 4.31.0-149)
Beaker CLI (version 1.5.235)
uv package manager
git, curl, wget, vim

Build Args

`GIT_COMMIT`: Current git commit hash (injected at build time)
`GIT_BRANCH`: Current git branch name (injected at build time)

Credentials

No credentials required for building. At runtime, Beaker injects secrets:

`HF_TOKEN`: HuggingFace API token
`WANDB_API_KEY`: Weights & Biases API key
`BEAKER_TOKEN`: Beaker authentication token

Quick Install

# Build and launch on Beaker (requires clean git state)
./scripts/train/build_image_and_launch.sh scripts/train/debug/single_gpu_on_beaker.sh

Code Evidence

Base image from `Dockerfile:1`:

FROM nvidia/cuda:12.9.0-devel-ubuntu22.04

OFED networking drivers from `Dockerfile:26-38`:

ENV MFT_VER=4.31.0-149
ENV DOFED_VER=2.10.0

Clean git check from `scripts/train/build_image_and_launch.sh:19-24`:

git_hash=$(git rev-parse --short HEAD)
git_branch=$(git rev-parse --abbrev-ref HEAD)
sanitized_branch=$(echo "$git_branch" | sed 's/[^a-zA-Z0-9._-]/-/g' | tr '[:upper:]' '[:lower:]' | sed 's/^-//')
image_name=open-instruct-integration-test-${sanitized_branch}

Docker cache configuration from `scripts/train/build_image_and_launch.sh:34,40-41`:

CACHE_REPO="${DOCKER_CACHE_REPO:-ghcr.io/allenai/open-instruct:buildcache}"
--cache-from "type=registry,ref=$CACHE_REPO"
--cache-to "type=registry,ref=$CACHE_REPO,mode=max"

Common Errors

Error Message	Cause	Solution
`ERROR: uncommitted changes detected`	Git working tree is dirty	Commit all changes before running build_image_and_launch.sh
Docker build cache miss	First build on a new branch	Build takes longer; subsequent builds use registry cache
Beaker experiment fails to start	Missing secrets in Beaker workspace	Ensure HF_TOKEN, WANDB_API_KEY, BEAKER_TOKEN are configured

Compatibility Notes

Local development: Docker is only needed for Beaker experiments. Local training uses the Python environment directly.
Dirty tree builds: Use `build_image_and_launch_dirty.sh` for testing with uncommitted changes (not recommended for production).
Cache strategy: Uses Docker BuildKit registry-based caching for faster rebuilds.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment