Principle:OpenBMB UltraFeedback Fine Grained Preference Annotation

Knowledge Sources	UltraFeedback Judging LLM-as-a-Judge UltraFeedback
Domains	NLP, Evaluation, Preference_Learning
Last Updated	2023-10-02 00:00 GMT

Overview

A multi-aspect preference annotation strategy that uses GPT-4 to rate model completions across four independent quality dimensions with rubric-guided evaluation.

Description

Fine-Grained Preference Annotation is the second GPT-4 annotation pass in the UltraFeedback pipeline. Unlike the critique annotation (which produces a single holistic score), this pass evaluates each completion across four independent aspects:

Instruction Following (1-5): Alignment between output and task intent, assessing goal understanding and constraint adherence.
Honesty (1-5 or N/A): How well the model conveys uncertainty and calibrates confidence relative to correctness. Creative tasks receive N/A.
Truthfulness (1-5): Accuracy assessment using a hallucination taxonomy with three types: factual errors, instruction-contradictory, and self-contradictory/logical errors.
Helpfulness (1-5): Overall informativeness and correctness, evaluating clarity, comprehensiveness, and conciseness.

A critical design feature is randomized completion ordering: for each aspect evaluation, the 4 completions are presented in a random permutation to mitigate position bias (GPT-4 tends to favor earlier entries). The SHUFFLE_NUM parameter controls how many random orderings are evaluated per aspect (default: 1).

Each aspect uses a detailed rubric template that defines the rating scale, provides format examples, and specifies the expected output structure. Truthfulness and helpfulness aspects additionally receive world knowledge context for subsets where ground truth is available.

Usage

Use this principle when you need multi-dimensional preference signals rather than a single overall score. The per-aspect ratings enable fine-grained reward modeling where different aspects can be weighted differently during training.

Theoretical Basis

The multi-aspect design is grounded in the observation that a single preference score conflates multiple quality dimensions. A response can be highly helpful but dishonest, or truthful but unhelpful. Decomposing the evaluation allows:

Training reward models that distinguish between different failure modes
Identifying specific weaknesses in model behavior
Creating preference pairs that are informative along specific dimensions

Annotation Schema:

# For instruction_following and honesty aspects:
annotation = {"Rating": int(1-5), "Rationale": str}

# For truthfulness and helpfulness aspects:
annotation = {
    "Type": List[int] or "None",  # Hallucination/informativeness types
    "Rationale": str,              # Rationale for type identification
    "Rating": int(1-5),           # Quality rating
    "Rationale For Rating": str    # Rationale for the rating
}

The randomized ordering mitigates GPT-4's known position bias:

# Generate random permutation of 4 completions
order = list(range(4))
random.shuffle(order)
# Present completions in shuffled order to GPT-4
# Map ratings back to original positions

Related Pages

Implemented By

Implementation:OpenBMB_UltraFeedback_GPT4_Preference_Annotator

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment