Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:OpenBMB UltraFeedback Model Sampling

From Leeroopedia


Knowledge Sources
Domains NLP, Data_Construction, Preference_Learning
Last Updated 2023-10-02 00:00 GMT

Overview

A randomized assignment strategy that pairs each instruction with a subset of models drawn from a diverse pool, ensuring broad model coverage across the preference dataset.

Description

Model Sampling is the second stage of the UltraFeedback dataset construction pipeline. Given a pool of 17 language models spanning commercial APIs (GPT-4, GPT-3.5-turbo, Bard), large open-source models (UltraLM-65B, WizardLM-30B, Vicuna-33B, LLaMA-2-70B-Chat), mid-size models (UltraLM-13B, WizardLM-13B, LLaMA-2-13B-Chat), smaller models (WizardLM-7B, Alpaca-7B, LLaMA-2-7B-Chat), and non-LLaMA architectures (Falcon-40B-Instruct, StarChat, MPT-30B-Chat, Pythia-12B), the sampling step randomly assigns models to each instruction.

The purpose is to ensure that each instruction receives completions from diverse models of varying capability, which is essential for constructing meaningful preference pairs. When the final dataset includes completions from both strong and weak models for the same instruction, preference annotation can capture meaningful quality differences.

Usage

Use this principle when constructing preference datasets where each instruction should be answered by multiple models. The random sampling ensures that no single model dominates the dataset and that the preference signal captures genuine quality differences rather than artifacts of model selection.

Theoretical Basis

The sampling strategy uses uniform random sampling without replacement from the model pool. In the UltraFeedback pipeline, each instruction is assigned 1 model per sampling pass (with 4 passes total to accumulate 4 completions).

Pseudo-code Logic:

# Abstract algorithm
model_pool = [17 diverse models across size/architecture tiers]

for each instruction in dataset:
    assigned_models = random.sample(model_pool, k=1)  # per pass
    instruction.models = assigned_models
    instruction.completions = []  # initialized empty

Key design decisions:

  • Pool diversity: 17 models across 4 capability tiers (commercial, large, mid, small) and multiple architectures
  • Random assignment: Prevents systematic bias in model-instruction pairings
  • Single model per pass: Each generation pass assigns 1 model; multiple passes accumulate completions

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment