Principle:OpenBMB UltraFeedback Critique Annotation
| Knowledge Sources | |
|---|---|
| Domains | NLP, Evaluation, Preference_Learning |
| Last Updated | 2023-10-02 00:00 GMT |
Overview
An LLM-as-a-judge annotation strategy that uses GPT-4 to generate textual critiques and overall quality scores for model completions.
Description
Critique Annotation is the first of two GPT-4 annotation passes in the UltraFeedback pipeline. For each completion, GPT-4 is prompted to act as a teacher and provide:
- Textual feedback (critique): Specific and constructive feedback identifying weaknesses and providing improvement suggestions, considering helpfulness, truthfulness, honesty, and instruction-following.
- Overall quality score (1-10): A holistic score where 1 is worst and 10 is best.
The annotation prompt is carefully designed to avoid several pitfalls:
- It asks GPT-4 to focus on the completion relative to the instruction, preventing score inflation
- It includes the principle system prompt as context (appended as "Note: ...") so GPT-4 can assess adherence to the intended behavior
- It explicitly prohibits providing reference answers, keeping the critique constructive
- For verbalized_calibration principles, the system prompt is truncated before the example to prevent GPT-4 from being confused by the format specification
The score is parsed from a structured "Overall Score: [1-10]" delimiter in GPT-4's response, with special handling for "X/10" format.
Usage
Use this principle when you need holistic quality assessments of LLM completions. The critique provides human-readable explanations for the scores, which are valuable for debugging model behavior and for the downstream score validation step.
Theoretical Basis
The LLM-as-a-judge paradigm leverages strong models (GPT-4) to evaluate weaker models. Key considerations:
- Temperature 0: Deterministic annotation for reproducibility
- System prompt inclusion: GPT-4 assesses whether the model followed its behavioral principle
- Structured output format: Enables programmatic parsing of scores
- Retry logic: Up to 10 retries for API failures
Failed to parse (syntax error): {\displaystyle \text{overall\_score} = \text{GPT-4}(\text{instruction}, \text{principle}, \text{completion}) \in [1, 10] }