Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:Allenai Open instruct Docker Container

From Leeroopedia


Knowledge Sources
Domains Infrastructure, Containerization
Last Updated 2026-02-07 00:00 GMT

Overview

Docker container environment based on nvidia/cuda:12.9.0-devel-ubuntu22.04 for reproducible training on Beaker.

Description

The Dockerfile defines the complete training container: CUDA 12.9 on Ubuntu 22.04, with NVIDIA DOCA OFED networking drivers (for InfiniBand), the Beaker CLI, uv package manager, and all Python dependencies. The container is built and launched via `scripts/train/build_image_and_launch.sh`, which requires a clean git state (all changes committed).

Usage

Use this environment for all Beaker-based training and evaluation jobs. The `build_image_and_launch.sh` script builds the Docker image, pushes it to the container registry, and submits it as a Beaker experiment. Local development can use the Python environment directly without Docker.

System Requirements

Category Requirement Notes
Container Runtime Docker with BuildKit (buildx) For building multi-stage images
Registry ghcr.io/allenai/open-instruct Container registry for caching and distribution
Git Clean working tree required build_image_and_launch.sh checks for uncommitted changes

Dependencies

Base Image

  • `nvidia/cuda:12.9.0-devel-ubuntu22.04`

System Packages (in container)

  • NVIDIA DOCA OFED (version 2.10.0) with Mellanox networking
  • Mellanox Firmware Tools (MFT version 4.31.0-149)
  • Beaker CLI (version 1.5.235)
  • uv package manager
  • git, curl, wget, vim

Build Args

  • `GIT_COMMIT`: Current git commit hash (injected at build time)
  • `GIT_BRANCH`: Current git branch name (injected at build time)

Credentials

No credentials required for building. At runtime, Beaker injects secrets:

  • `HF_TOKEN`: HuggingFace API token
  • `WANDB_API_KEY`: Weights & Biases API key
  • `BEAKER_TOKEN`: Beaker authentication token

Quick Install

# Build and launch on Beaker (requires clean git state)
./scripts/train/build_image_and_launch.sh scripts/train/debug/single_gpu_on_beaker.sh

Code Evidence

Base image from `Dockerfile:1`:

FROM nvidia/cuda:12.9.0-devel-ubuntu22.04

OFED networking drivers from `Dockerfile:26-38`:

ENV MFT_VER=4.31.0-149
ENV DOFED_VER=2.10.0

Clean git check from `scripts/train/build_image_and_launch.sh:19-24`:

git_hash=$(git rev-parse --short HEAD)
git_branch=$(git rev-parse --abbrev-ref HEAD)
sanitized_branch=$(echo "$git_branch" | sed 's/[^a-zA-Z0-9._-]/-/g' | tr '[:upper:]' '[:lower:]' | sed 's/^-//')
image_name=open-instruct-integration-test-${sanitized_branch}

Docker cache configuration from `scripts/train/build_image_and_launch.sh:34,40-41`:

CACHE_REPO="${DOCKER_CACHE_REPO:-ghcr.io/allenai/open-instruct:buildcache}"
--cache-from "type=registry,ref=$CACHE_REPO"
--cache-to "type=registry,ref=$CACHE_REPO,mode=max"

Common Errors

Error Message Cause Solution
`ERROR: uncommitted changes detected` Git working tree is dirty Commit all changes before running build_image_and_launch.sh
Docker build cache miss First build on a new branch Build takes longer; subsequent builds use registry cache
Beaker experiment fails to start Missing secrets in Beaker workspace Ensure HF_TOKEN, WANDB_API_KEY, BEAKER_TOKEN are configured

Compatibility Notes

  • Local development: Docker is only needed for Beaker experiments. Local training uses the Python environment directly.
  • Dirty tree builds: Use `build_image_and_launch_dirty.sh` for testing with uncommitted changes (not recommended for production).
  • Cache strategy: Uses Docker BuildKit registry-based caching for faster rebuilds.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment