Implementation:OpenBMB UltraFeedback GPT4 Critique Annotator
| Knowledge Sources | |
|---|---|
| Domains | NLP, Evaluation |
| Last Updated | 2023-10-02 00:00 GMT |
Overview
Concrete tool for generating GPT-4 critiques and overall quality scores for model completions in the UltraFeedback annotation pipeline.
Description
The annotate_critique.py module provides three core components:
get_eval(model, sys_prompt, user_prompt): Calls openai.ChatCompletion.create with model="gpt-4", temperature=0, max_tokens=1024, top_p=0.6. Includes retry logic (10 attempts) with error handling.
annotate(example): Iterates over each completion in the example, formats the feedback prompt with the instruction (including the principle as a "Note:"), calls get_eval, and parses the response. The response is split on \nOverall Score: to separate the critique text from the numeric score. Handles "X/10" score format by extracting the numerator.
feedback_prompt: A detailed template that instructs GPT-4 to provide constructive feedback considering helpfulness, truthfulness, honesty, and instruction-following, then score overall quality from 1 to 10.
Usage
Run as a standalone script that iterates over subsets, loads completion data, annotates each example, and saves the results to an annotation directory.
Code Reference
Source Location
- Repository: UltraFeedback
- File: src/data_annotation/annotate_critique.py (Lines 18-106)
Signature
system_prompt = "A chat between a curious user and an artificial intelligence expert. The expert gives helpful, specific, and concise answers to the user's questions."
feedback_prompt = """Given my answer to an instruction, your role is to provide specific and constructive feedback...
### Instruction
{instruction}
### Answer
{completion}
...
*Format*
### Feedback
[Your feedback]
Overall Score: [1-10]
...
### Feedback
"""
def get_eval(model: str, sys_prompt: str, user_prompt: str) -> str:
"""Calls GPT-4 with retry logic.
Args:
model: Model name (e.g., "gpt-4-0613")
sys_prompt: System prompt for GPT-4
user_prompt: Formatted feedback prompt
Returns:
GPT-4 response content string
Raises:
Exception: After 10 failed attempts
"""
...
def annotate(example: Dict) -> Dict:
"""Annotates all completions in an example with critique and overall_score.
Args:
example: Dict with 'instruction' and 'completions' fields
Returns:
example: Same dict with each completion enriched with 'critique' and 'overall_score'
"""
...
Import
import openai
import datasets
import json
import pandas as pd
import re
from copy import deepcopy
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| example | Dict | Yes | Dataset example with 'instruction' and 'completions' fields |
| example["completions"][i]["response"] | str | Yes | Generated text to evaluate |
| example["completions"][i]["custom_system_prompt"] | str | Yes | Principle prompt used during generation |
| example["completions"][i]["principle"] | str | Yes | Principle category (for verbalized_calibration truncation) |
Outputs
| Name | Type | Description |
|---|---|---|
| example["completions"][i]["critique"] | str | Textual feedback from GPT-4 |
| example["completions"][i]["overall_score"] | Union[str, float] | Overall quality score (1-10), parsed from GPT-4 response |
Usage Examples
Annotating a Single Example
import openai
openai.api_key = "YOUR_KEY"
# Example with one completion
example = {
"instruction": "Explain quantum entanglement.",
"completions": [{
"model": "llama-2-13b-chat",
"principle": "helpfulness",
"custom_system_prompt": "The assistant should provide accurate information...",
"response": "Quantum entanglement is a phenomenon..."
}]
}
# Annotate
result = annotate(example)
print(result["completions"][0]["critique"]) # Textual feedback
print(result["completions"][0]["overall_score"]) # Numeric score (1-10)
Full Pipeline Execution
import json
import os
import datasets
import pandas as pd
from tqdm import tqdm
subsets = ["sharegpt", "flan", "evol_instruct", "ultrachat", "truthful_qa", "false_qa"]
for subset in subsets:
with open(os.path.join("annotation", subset + ".json"), "r") as f:
dataset = json.load(f)
dataset = pd.DataFrame(dataset)
dataset = datasets.Dataset.from_pandas(dataset)
dataset_dict = []
for data in tqdm(dataset, total=len(dataset), desc="Annotating"):
dataset_dict.append(annotate(data))
result_path = os.path.join("annotation", subset + ".json")
with open(result_path, "w") as f:
json.dump([{k: v for k, v in data.items()} for data in dataset_dict], f, indent=4)