Implementation:OpenBMB UltraFeedback GPT4 Critique Annotator

Knowledge Sources	UltraFeedback OpenAI API
Domains	NLP, Evaluation
Last Updated	2023-10-02 00:00 GMT

Overview

Concrete tool for generating GPT-4 critiques and overall quality scores for model completions in the UltraFeedback annotation pipeline.

Description

The annotate_critique.py module provides three core components:

get_eval(model, sys_prompt, user_prompt): Calls openai.ChatCompletion.create with model="gpt-4", temperature=0, max_tokens=1024, top_p=0.6. Includes retry logic (10 attempts) with error handling.

annotate(example): Iterates over each completion in the example, formats the feedback prompt with the instruction (including the principle as a "Note:"), calls get_eval, and parses the response. The response is split on \nOverall Score: to separate the critique text from the numeric score. Handles "X/10" score format by extracting the numerator.

feedback_prompt: A detailed template that instructs GPT-4 to provide constructive feedback considering helpfulness, truthfulness, honesty, and instruction-following, then score overall quality from 1 to 10.

Usage

Run as a standalone script that iterates over subsets, loads completion data, annotates each example, and saves the results to an annotation directory.

Code Reference

Source Location

Repository: UltraFeedback
File: src/data_annotation/annotate_critique.py (Lines 18-106)

Signature

system_prompt = "A chat between a curious user and an artificial intelligence expert. The expert gives helpful, specific, and concise answers to the user's questions."

feedback_prompt = """Given my answer to an instruction, your role is to provide specific and constructive feedback...
### Instruction
{instruction}
### Answer
{completion}
...
*Format*
### Feedback
[Your feedback]
Overall Score: [1-10]
...
### Feedback
"""

def get_eval(model: str, sys_prompt: str, user_prompt: str) -> str:
    """Calls GPT-4 with retry logic.
    Args:
        model: Model name (e.g., "gpt-4-0613")
        sys_prompt: System prompt for GPT-4
        user_prompt: Formatted feedback prompt
    Returns:
        GPT-4 response content string
    Raises:
        Exception: After 10 failed attempts
    """
    ...

def annotate(example: Dict) -> Dict:
    """Annotates all completions in an example with critique and overall_score.
    Args:
        example: Dict with 'instruction' and 'completions' fields
    Returns:
        example: Same dict with each completion enriched with 'critique' and 'overall_score'
    """
    ...

Import

import openai
import datasets
import json
import pandas as pd
import re
from copy import deepcopy

I/O Contract

Inputs

Name	Type	Required	Description
example	Dict	Yes	Dataset example with 'instruction' and 'completions' fields
example["completions"][i]["response"]	str	Yes	Generated text to evaluate
example["completions"][i]["custom_system_prompt"]	str	Yes	Principle prompt used during generation
example["completions"][i]["principle"]	str	Yes	Principle category (for verbalized_calibration truncation)

Outputs

Name	Type	Description
example["completions"][i]["critique"]	str	Textual feedback from GPT-4
example["completions"][i]["overall_score"]	Union[str, float]	Overall quality score (1-10), parsed from GPT-4 response

Usage Examples

Annotating a Single Example

import openai
openai.api_key = "YOUR_KEY"

# Example with one completion
example = {
    "instruction": "Explain quantum entanglement.",
    "completions": [{
        "model": "llama-2-13b-chat",
        "principle": "helpfulness",
        "custom_system_prompt": "The assistant should provide accurate information...",
        "response": "Quantum entanglement is a phenomenon..."
    }]
}

# Annotate
result = annotate(example)
print(result["completions"][0]["critique"])       # Textual feedback
print(result["completions"][0]["overall_score"])   # Numeric score (1-10)

Full Pipeline Execution

import json
import os
import datasets
import pandas as pd
from tqdm import tqdm

subsets = ["sharegpt", "flan", "evol_instruct", "ultrachat", "truthful_qa", "false_qa"]

for subset in subsets:
    with open(os.path.join("annotation", subset + ".json"), "r") as f:
        dataset = json.load(f)
    dataset = pd.DataFrame(dataset)
    dataset = datasets.Dataset.from_pandas(dataset)

    dataset_dict = []
    for data in tqdm(dataset, total=len(dataset), desc="Annotating"):
        dataset_dict.append(annotate(data))

    result_path = os.path.join("annotation", subset + ".json")
    with open(result_path, "w") as f:
        json.dump([{k: v for k, v in data.items()} for data in dataset_dict], f, indent=4)

Related Pages

Implements Principle

Principle:OpenBMB_UltraFeedback_Critique_Annotation

Requires Environment

Environment:OpenBMB_UltraFeedback_OpenAI_API_Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment