Implementation:OpenBMB UltraFeedback Inference Environment Setup

Knowledge Sources	UltraFeedback
Domains	DevOps, ML_Infrastructure
Last Updated	2023-10-02 00:00 GMT

Overview

Concrete tool for setting up the inference environment using shell launcher scripts for HuggingFace and vLLM backends.

Description

This is an External Tool Doc documenting shell scripts rather than Python APIs. Two launcher scripts configure the environment and invoke the generation pipeline:

run.sh (HuggingFace backend): Installs pinned dependency versions and launches main.py with model_type and shard ID arguments.

run_vllm.sh (vLLM backend): Sets NCCL and Ray environment variables, installs latest package versions, and launches main_vllm_batch.py with model_type argument. Note: the script references main_vllm_batch.py but the repository contains main_vllm.py — this may be a filename discrepancy.

Usage

Run from the src/comparison_data_generation/ directory:

HF backend: bash run.sh {model_type} {shard_id}
vLLM backend: bash run_vllm.sh {model_type}

Code Reference

Source Location

Repository: UltraFeedback
File: src/comparison_data_generation/run.sh (Lines 1-7)
File: src/comparison_data_generation/run_vllm.sh (Lines 1-15)

Signature

# run.sh — HuggingFace backend launcher
pip install transformers==4.31.0
pip install tokenizers==0.13.3
pip install deepspeed==0.10.0
pip install accelerate -U

python main.py --model_type ${1} --id ${2}

# run_vllm.sh — vLLM backend launcher
export NCCL_IGNORE_DISABLED_P2P=1

pip install transformers -U
pip install tokenizers -U
pip install deepspeed -U
pip install accelerate -U
pip install vllm -U

echo $1

export NCCL_IGNORE_DISABLED_P2P=1
export RAY_memory_monitor_refresh_ms=0
CUDA_LAUNCH_BLOCKING=1 python main_vllm_batch.py --model_type ${1}

Import

# Shell scripts — no Python imports
# Usage: bash run.sh ultralm-13b 0
# Usage: bash run_vllm.sh ultralm-13b

I/O Contract

Inputs

Name	Type	Required	Description
$1 (model_type)	str	Yes	Model identifier (e.g., "ultralm-13b", "alpaca-7b")
$2 (shard_id)	int	HF only	Shard ID for parallel processing (0, 1, 2, ...)

Outputs

Name	Type	Description
Installed environment	System	Python packages installed at required versions
Pipeline execution	Process	Launches main.py or main_vllm_batch.py with arguments

Usage Examples

HuggingFace Backend

# Generate completions for ultralm-13b, shard 0
cd src/comparison_data_generation/
bash run.sh ultralm-13b 0

# Generate completions for alpaca-7b, shard 1
bash run.sh alpaca-7b 1

vLLM Backend

# Generate completions for ultralm-13b with vLLM (all shards at once)
cd src/comparison_data_generation/
bash run_vllm.sh ultralm-13b

# Generate completions for vicuna-33b
bash run_vllm.sh vicuna-33b

Related Pages

Implements Principle

Principle:OpenBMB_UltraFeedback_Environment_Setup

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment