Principle:OpenBMB UltraFeedback Model Sampling
| Knowledge Sources | |
|---|---|
| Domains | NLP, Data_Construction, Preference_Learning |
| Last Updated | 2023-10-02 00:00 GMT |
Overview
A randomized assignment strategy that pairs each instruction with a subset of models drawn from a diverse pool, ensuring broad model coverage across the preference dataset.
Description
Model Sampling is the second stage of the UltraFeedback dataset construction pipeline. Given a pool of 17 language models spanning commercial APIs (GPT-4, GPT-3.5-turbo, Bard), large open-source models (UltraLM-65B, WizardLM-30B, Vicuna-33B, LLaMA-2-70B-Chat), mid-size models (UltraLM-13B, WizardLM-13B, LLaMA-2-13B-Chat), smaller models (WizardLM-7B, Alpaca-7B, LLaMA-2-7B-Chat), and non-LLaMA architectures (Falcon-40B-Instruct, StarChat, MPT-30B-Chat, Pythia-12B), the sampling step randomly assigns models to each instruction.
The purpose is to ensure that each instruction receives completions from diverse models of varying capability, which is essential for constructing meaningful preference pairs. When the final dataset includes completions from both strong and weak models for the same instruction, preference annotation can capture meaningful quality differences.
Usage
Use this principle when constructing preference datasets where each instruction should be answered by multiple models. The random sampling ensures that no single model dominates the dataset and that the preference signal captures genuine quality differences rather than artifacts of model selection.
Theoretical Basis
The sampling strategy uses uniform random sampling without replacement from the model pool. In the UltraFeedback pipeline, each instruction is assigned 1 model per sampling pass (with 4 passes total to accumulate 4 completions).
Pseudo-code Logic:
# Abstract algorithm
model_pool = [17 diverse models across size/architecture tiers]
for each instruction in dataset:
assigned_models = random.sample(model_pool, k=1) # per pass
instruction.models = assigned_models
instruction.completions = [] # initialized empty
Key design decisions:
- Pool diversity: 17 models across 4 capability tiers (commercial, large, mid, small) and multiple architectures
- Random assignment: Prevents systematic bias in model-instruction pairings
- Single model per pass: Each generation pass assigns 1 model; multiple passes accumulate completions