Principle:OpenBMB UltraFeedback Critique Annotation

Knowledge Sources	UltraFeedback Judging LLM-as-a-Judge UltraFeedback
Domains	NLP, Evaluation, Preference_Learning
Last Updated	2023-10-02 00:00 GMT

Overview

An LLM-as-a-judge annotation strategy that uses GPT-4 to generate textual critiques and overall quality scores for model completions.

Description

Critique Annotation is the first of two GPT-4 annotation passes in the UltraFeedback pipeline. For each completion, GPT-4 is prompted to act as a teacher and provide:

Textual feedback (critique): Specific and constructive feedback identifying weaknesses and providing improvement suggestions, considering helpfulness, truthfulness, honesty, and instruction-following.
Overall quality score (1-10): A holistic score where 1 is worst and 10 is best.

The annotation prompt is carefully designed to avoid several pitfalls:

It asks GPT-4 to focus on the completion relative to the instruction, preventing score inflation
It includes the principle system prompt as context (appended as "Note: ...") so GPT-4 can assess adherence to the intended behavior
It explicitly prohibits providing reference answers, keeping the critique constructive
For verbalized_calibration principles, the system prompt is truncated before the example to prevent GPT-4 from being confused by the format specification

The score is parsed from a structured "Overall Score: [1-10]" delimiter in GPT-4's response, with special handling for "X/10" format.

Usage

Use this principle when you need holistic quality assessments of LLM completions. The critique provides human-readable explanations for the scores, which are valuable for debugging model behavior and for the downstream score validation step.

Theoretical Basis

The LLM-as-a-judge paradigm leverages strong models (GPT-4) to evaluate weaker models. Key considerations:

Temperature 0: Deterministic annotation for reproducibility
System prompt inclusion: GPT-4 assesses whether the model followed its behavioral principle
Structured output format: Enables programmatic parsing of scores
Retry logic: Up to 10 retries for API failures

Failed to parse (syntax error): {\displaystyle \text{overall\_score} = \text{GPT-4}(\text{instruction}, \text{principle}, \text{completion}) \in [1, 10] }

Related Pages

Implemented By

Implementation:OpenBMB_UltraFeedback_GPT4_Critique_Annotator

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment