Principle:OpenBMB UltraFeedback Principle Sampling

Knowledge Sources	UltraFeedback UltraFeedback
Domains	NLP, Alignment, Preference_Learning
Last Updated	2023-10-02 00:00 GMT

Overview

A conditional sampling strategy that assigns behavioral principles (system prompts) to each instruction based on its source dataset, guiding model responses toward specific alignment dimensions.

Description

Principle Sampling is a key innovation of the UltraFeedback pipeline. Rather than using a single fixed system prompt for all completions, the pipeline samples from a library of ~55 principle prompts organized into 5 categories: helpfulness (11 variants), harmlessness (11 variants), honesty (11 variants), truthfulness (11 variants), and verbalized_calibration (1 variant).

The sampling is conditioned on the instruction source (subset):

ShareGPT/UltraChat: 3:1:1 ratio of helpfulness:truthfulness:honesty
FLAN: 4:1 ratio of helpfulness:verbalized_calibration
Evol-Instruct: 100% helpfulness
TruthfulQA/FalseQA: 1:1 ratio of honesty:truthfulness

Additionally, when honesty is selected, there is a 10% chance of replacing it with verbalized_calibration, which asks the model to express numeric confidence scores.

This conditional approach ensures that models are guided to demonstrate different alignment behaviors depending on the nature of the instruction, producing more diverse and diagnostically useful completions for preference annotation.

Usage

Use this principle when designing data generation pipelines where you want to elicit specific behavioral properties from LLMs. The subset-conditional distribution ensures that alignment dimensions relevant to each task type are adequately represented, while the multiple prompt variants within each category provide linguistic diversity in the system prompts.

Theoretical Basis

The design draws on research showing that system prompts significantly influence LLM behavior. By varying system prompts, the pipeline creates controlled variation in model outputs, making it possible to assess models along specific alignment axes.

Pseudo-code Logic:

# Abstract algorithm
principles = {
    "helpfulness": [11 prompt variants],
    "harmlessness": [11 prompt variants],
    "honesty": [11 prompt variants],
    "truthfulness": [11 prompt variants],
    "verbalized_calibration": [1 prompt variant],
}

def sample_principle(subset: str) -> Tuple[str, str]:
    # Select category based on subset-specific distribution
    if subset in ["sharegpt", "ultrachat"]:
        category = random.choice(["helpfulness"]*3 + ["truthfulness", "honesty"])
    elif subset == "flan":
        category = random.choice(["helpfulness"]*4 + ["verbalized_calibration"])
    elif subset == "evol_instruct":
        category = "helpfulness"
    elif subset in ["truthful_qa", "false_qa"]:
        category = random.choice(["honesty", "truthfulness"])

    # 10% chance of honesty -> verbalized_calibration
    if category == "honesty" and random.random() > 0.9:
        category = "verbalized_calibration"

    prompt = random.choice(principles[category])
    return category, prompt

Related Pages

Implemented By

Implementation:OpenBMB_UltraFeedback_Principle_Selection

Uses Heuristic

Heuristic:OpenBMB_UltraFeedback_Principle_Distribution_Tuning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment