Principle:OpenBMB UltraFeedback Result Collection

Knowledge Sources	UltraFeedback UltraFeedback
Domains	NLP, Data_Construction
Last Updated	2023-10-02 00:00 GMT

Overview

A completion aggregation and persistence strategy that collects model-generated responses and stores them in a structured JSON format for downstream annotation.

Description

Result Collection is the final step of the completion generation phase. After each model generates a response to an instruction, the result is appended to the instruction's completions array as a structured dictionary containing the model identifier, the principle category, the system prompt used, and the generated text.

The pipeline writes results back to the same JSON file it read from (in-place update), allowing incremental accumulation of completions across multiple generation passes with different models. Each completion entry preserves full provenance: which model generated it, which principle guided it, and the exact system prompt used.

Usage

Use this principle when building data generation pipelines that accumulate completions from multiple sources. The in-place JSON update pattern allows running the pipeline separately for each model while building up a complete dataset.

Theoretical Basis

The storage schema follows a nested document design where each instruction is the top-level record and completions are nested arrays. This is more natural than a flat table design because the number of completions per instruction varies.

Pseudo-code Logic:

# Abstract algorithm
for each (instruction, model, principle, response):
    instruction["completions"].append({
        "model": model_type,
        "principle": principle_category,
        "custom_system_prompt": principle_prompt_text,
        "response": generated_text
    })

# Persist to disk
json.dump(dataset, file, indent=4)

Related Pages

Implemented By

Implementation:OpenBMB_UltraFeedback_Completion_Storage

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment