Implementation:OpenBMB UltraFeedback Inference Environment Setup
| Knowledge Sources | |
|---|---|
| Domains | DevOps, ML_Infrastructure |
| Last Updated | 2023-10-02 00:00 GMT |
Overview
Concrete tool for setting up the inference environment using shell launcher scripts for HuggingFace and vLLM backends.
Description
This is an External Tool Doc documenting shell scripts rather than Python APIs. Two launcher scripts configure the environment and invoke the generation pipeline:
run.sh (HuggingFace backend): Installs pinned dependency versions and launches main.py with model_type and shard ID arguments.
run_vllm.sh (vLLM backend): Sets NCCL and Ray environment variables, installs latest package versions, and launches main_vllm_batch.py with model_type argument. Note: the script references main_vllm_batch.py but the repository contains main_vllm.py — this may be a filename discrepancy.
Usage
Run from the src/comparison_data_generation/ directory:
- HF backend: bash run.sh {model_type} {shard_id}
- vLLM backend: bash run_vllm.sh {model_type}
Code Reference
Source Location
- Repository: UltraFeedback
- File: src/comparison_data_generation/run.sh (Lines 1-7)
- File: src/comparison_data_generation/run_vllm.sh (Lines 1-15)
Signature
# run.sh — HuggingFace backend launcher
pip install transformers==4.31.0
pip install tokenizers==0.13.3
pip install deepspeed==0.10.0
pip install accelerate -U
python main.py --model_type ${1} --id ${2}
# run_vllm.sh — vLLM backend launcher
export NCCL_IGNORE_DISABLED_P2P=1
pip install transformers -U
pip install tokenizers -U
pip install deepspeed -U
pip install accelerate -U
pip install vllm -U
echo $1
export NCCL_IGNORE_DISABLED_P2P=1
export RAY_memory_monitor_refresh_ms=0
CUDA_LAUNCH_BLOCKING=1 python main_vllm_batch.py --model_type ${1}
Import
# Shell scripts — no Python imports
# Usage: bash run.sh ultralm-13b 0
# Usage: bash run_vllm.sh ultralm-13b
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| $1 (model_type) | str | Yes | Model identifier (e.g., "ultralm-13b", "alpaca-7b") |
| $2 (shard_id) | int | HF only | Shard ID for parallel processing (0, 1, 2, ...) |
Outputs
| Name | Type | Description |
|---|---|---|
| Installed environment | System | Python packages installed at required versions |
| Pipeline execution | Process | Launches main.py or main_vllm_batch.py with arguments |
Usage Examples
HuggingFace Backend
# Generate completions for ultralm-13b, shard 0
cd src/comparison_data_generation/
bash run.sh ultralm-13b 0
# Generate completions for alpaca-7b, shard 1
bash run.sh alpaca-7b 1
vLLM Backend
# Generate completions for ultralm-13b with vLLM (all shards at once)
cd src/comparison_data_generation/
bash run_vllm.sh ultralm-13b
# Generate completions for vicuna-33b
bash run_vllm.sh vicuna-33b