Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:OpenBMB UltraFeedback Instruction Sampling

From Leeroopedia


Knowledge Sources
Domains NLP, Data_Construction, Preference_Learning
Last Updated 2023-10-02 00:00 GMT

Overview

A data curation strategy that aggregates instructions from diverse NLP task sources to create a broad-coverage seed corpus for preference dataset construction.

Description

Instruction Sampling is the first stage of preference dataset construction. It involves loading pre-prepared instruction datasets from multiple heterogeneous sources, each contributing different task types and difficulty levels. In the UltraFeedback pipeline, six instruction sources are used: UltraChat (multi-turn dialogue), ShareGPT (real user conversations), FLAN (academic NLP tasks), Evol-Instruct (complexity-evolved instructions), TruthfulQA (adversarial truthfulness probes), and FalseQA (false-premise questions). The diversity of sources ensures the resulting preference dataset covers a wide spectrum of instruction-following capabilities.

Each source is stored as a JSON file and loaded into a HuggingFace Dataset object for downstream processing. The source identity (subset name) is preserved throughout the pipeline because it determines which principle distribution and world knowledge context apply in later stages.

Usage

Use this principle when constructing preference datasets that require broad instruction coverage. It is the entry point for any pipeline that generates multi-model completions and then annotates them for preference learning. The choice of instruction sources directly impacts the diversity and quality of the final preference pairs.

Theoretical Basis

The theoretical motivation comes from the observation that LLM alignment benefits from training on diverse instruction types. A preference dataset biased toward a single task type (e.g., only chat) produces models that are poorly calibrated on factual, reasoning, or safety tasks.

Pseudo-code Logic:

# Abstract algorithm
for each source in [ultrachat, sharegpt, flan, evol_instruct, truthful_qa, false_qa]:
    instructions = load_json(source_path)
    dataset = create_hf_dataset(instructions)
    # Preserve subset identity for downstream principle selection
    yield dataset, source_name

The key design decisions are:

  • Source diversity: Six sources spanning dialogue, academic NLP, adversarial probes, and evolved instructions
  • Flat loading: All sources are loaded as flat JSON, normalized to a common schema with an instruction field
  • Subset-aware processing: The subset name propagates through the pipeline to condition principle sampling and world knowledge injection

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment