Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:OpenBMB UltraFeedback Principle Sampling

From Leeroopedia
Revision as of 17:20, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/OpenBMB_UltraFeedback_Principle_Sampling.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains NLP, Alignment, Preference_Learning
Last Updated 2023-10-02 00:00 GMT

Overview

A conditional sampling strategy that assigns behavioral principles (system prompts) to each instruction based on its source dataset, guiding model responses toward specific alignment dimensions.

Description

Principle Sampling is a key innovation of the UltraFeedback pipeline. Rather than using a single fixed system prompt for all completions, the pipeline samples from a library of ~55 principle prompts organized into 5 categories: helpfulness (11 variants), harmlessness (11 variants), honesty (11 variants), truthfulness (11 variants), and verbalized_calibration (1 variant).

The sampling is conditioned on the instruction source (subset):

  • ShareGPT/UltraChat: 3:1:1 ratio of helpfulness:truthfulness:honesty
  • FLAN: 4:1 ratio of helpfulness:verbalized_calibration
  • Evol-Instruct: 100% helpfulness
  • TruthfulQA/FalseQA: 1:1 ratio of honesty:truthfulness

Additionally, when honesty is selected, there is a 10% chance of replacing it with verbalized_calibration, which asks the model to express numeric confidence scores.

This conditional approach ensures that models are guided to demonstrate different alignment behaviors depending on the nature of the instruction, producing more diverse and diagnostically useful completions for preference annotation.

Usage

Use this principle when designing data generation pipelines where you want to elicit specific behavioral properties from LLMs. The subset-conditional distribution ensures that alignment dimensions relevant to each task type are adequately represented, while the multiple prompt variants within each category provide linguistic diversity in the system prompts.

Theoretical Basis

The design draws on research showing that system prompts significantly influence LLM behavior. By varying system prompts, the pipeline creates controlled variation in model outputs, making it possible to assess models along specific alignment axes.

Pseudo-code Logic:

# Abstract algorithm
principles = {
    "helpfulness": [11 prompt variants],
    "harmlessness": [11 prompt variants],
    "honesty": [11 prompt variants],
    "truthfulness": [11 prompt variants],
    "verbalized_calibration": [1 prompt variant],
}

def sample_principle(subset: str) -> Tuple[str, str]:
    # Select category based on subset-specific distribution
    if subset in ["sharegpt", "ultrachat"]:
        category = random.choice(["helpfulness"]*3 + ["truthfulness", "honesty"])
    elif subset == "flan":
        category = random.choice(["helpfulness"]*4 + ["verbalized_calibration"])
    elif subset == "evol_instruct":
        category = "helpfulness"
    elif subset in ["truthful_qa", "false_qa"]:
        category = random.choice(["honesty", "truthfulness"])

    # 10% chance of honesty -> verbalized_calibration
    if category == "honesty" and random.random() > 0.9:
        category = "verbalized_calibration"

    prompt = random.choice(principles[category])
    return category, prompt

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment