Principle:OpenBMB UltraFeedback Environment Setup

Knowledge Sources	UltraFeedback
Domains	DevOps, ML_Infrastructure
Last Updated	2023-10-02 00:00 GMT

Overview

A dependency management and environment configuration strategy for setting up the two inference backends (HuggingFace and vLLM) used in the UltraFeedback generation pipeline.

Description

Environment Setup covers the installation and configuration of Python packages and environment variables required to run the UltraFeedback completion generation pipeline. Two distinct environment configurations exist:

HuggingFace Backend:

Pinned versions: transformers==4.31.0, tokenizers==0.13.3, deepspeed==0.10.0
Latest accelerate (-U flag)
Sequential single-model inference

vLLM Backend:

Latest versions of transformers, tokenizers, deepspeed, accelerate, and vllm
Environment variables: NCCL_IGNORE_DISABLED_P2P=1 (for multi-GPU NCCL communication), RAY_memory_monitor_refresh_ms=0 (disables Ray memory monitoring), CUDA_LAUNCH_BLOCKING=1 (synchronous CUDA for debugging)
Tensor-parallel multi-GPU inference

The key difference is that the HF backend uses pinned dependency versions for reproducibility, while the vLLM backend uses latest versions to benefit from ongoing vLLM performance improvements.

Usage

Choose the HF backend environment for single-GPU sequential inference with reproducible dependency versions. Choose the vLLM backend environment for multi-GPU batched inference with higher throughput.

Theoretical Basis

The two environments represent a trade-off between reproducibility (pinned versions) and performance (latest optimizations). The vLLM backend requires additional environment variables because:

NCCL_IGNORE_DISABLED_P2P=1: Works around peer-to-peer communication issues on certain GPU topologies
RAY_memory_monitor_refresh_ms=0: Prevents Ray (vLLM's distributed runtime) from interfering with GPU memory management
CUDA_LAUNCH_BLOCKING=1: Ensures synchronous CUDA execution for deterministic error reporting

Related Pages

Implemented By

Implementation:OpenBMB_UltraFeedback_Inference_Environment_Setup

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment