Implementation:OpenBMB UltraFeedback Multi Backend Inference
| Knowledge Sources | |
|---|---|
| Domains | NLP, Inference |
| Last Updated | 2023-10-02 00:00 GMT |
Overview
Concrete tool for generating text completions across API, HuggingFace pipeline, and vLLM backends in the UltraFeedback pipeline.
Description
The inference execution is split across two modules:
main.py (HuggingFace backend): The instruction_completion function (L157-222) handles both API and local inference. For API models, it calls generator(system_prompt, user_prompt) which invokes API_Caller.__call__ → openai.ChatCompletion.create. For local models, it calls the HuggingFace pipeline with generation parameters and applies post-processing (strip newlines, split on quadruple newlines).
main_vllm.py (vLLM backend): The instruction_completion function (L161-190) takes an entire dataset, constructs SamplingParams with model-specific stop tokens, and runs generator.generate(dataset["prompt"], sampling_params) for batch inference. Responses are stripped and cleaned of tokens, then merged back into the dataset.
Usage
The HF backend is called via dataset.map(instruction_completion) for sequential per-example processing. The vLLM backend is called as instruction_completion(dataset) for batch processing of the entire dataset at once.
Code Reference
Source Location
- Repository: UltraFeedback
- File: src/comparison_data_generation/main.py (Lines 99-118 for API_Caller.__call__, Lines 207-213 for HF inference)
- File: src/comparison_data_generation/main_vllm.py (Lines 161-190 for vLLM batch inference)
Signature
# HuggingFace backend: instruction_completion (main.py:L157-222)
@torch.no_grad()
def instruction_completion(example: Dict) -> Dict:
"""Generates a completion for a single example using the loaded generator.
For API models: generator(system_prompt=principle_prompt, user_prompt=instruction)
For local models: generator(prompt, num_return_sequences=1, return_full_text=False,
handle_long_generation="hole", temperature=1.0, top_p=1.0,
max_new_tokens=1024, do_sample=True,
stopping_criteria=stopping_criteria)
Appends result to example["completions"] as dict with keys:
model, principle, custom_system_prompt, response
"""
...
# vLLM backend: instruction_completion (main_vllm.py:L161-190)
@torch.no_grad()
def instruction_completion(dataset: datasets.Dataset) -> datasets.Dataset:
"""Batch inference over entire dataset using vLLM.
Constructs SamplingParams(temperature=1, top_p=1, max_tokens=1024, stop=stop)
Calls generator.generate(dataset["prompt"], sampling_params)
Merges responses back into dataset completions.
"""
...
# API_Caller.__call__ (main.py:L99-118)
def __call__(self, system_prompt: str, user_prompt: str) -> str:
"""Calls openai.ChatCompletion.create with temperature=1, max_tokens=1024, top_p=1.
Retries up to 20 times on failure."""
...
Import
# HuggingFace backend
from transformers import pipeline
import torch
import openai
# vLLM backend
from vllm import LLM, SamplingParams
import torch
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| example / dataset | Dict / datasets.Dataset | Yes | Single example (HF) or full dataset (vLLM) with prompt field |
| generator | Union[API_Caller, pipeline, LLM] | Yes | Loaded model (global variable) |
| model_type | str | Yes | Global variable: model identifier for dispatch logic |
| stopping_criteria | StoppingCriteriaList | No | HF backend only: model-specific stop token criteria |
Outputs
| Name | Type | Description |
|---|---|---|
| example["completions"] | List[Dict] | Appended with dict: {model, principle, custom_system_prompt, response} |
| response | str | Generated text, stripped and cleaned of stop tokens |
Usage Examples
HuggingFace Backend (Sequential)
# Called via dataset.map
dataset = dataset.map(instruction_completion, desc=f"{model_type} on {subset}")
# Each example gets one completion appended to example["completions"]
vLLM Backend (Batch)
from vllm import SamplingParams
# Batch inference over full dataset
sampling_params = SamplingParams(temperature=1, top_p=1, max_tokens=1024, stop=stop)
responses = generator.generate(dataset["prompt"], sampling_params)
responses = [r.outputs[0].text.strip().rstrip("</s>").strip() for r in responses]
# Merge responses back
dataset = dataset.add_column("response", responses)
dataset = dataset.map(lambda x: {
"completions": x["completions"][:-1] + [
dict(x["completions"][-1], **{"response": x["response"]})
]
})
dataset = dataset.remove_columns(["prompt", "response"])
Related Pages
Implements Principle
Requires Environment
- Environment:OpenBMB_UltraFeedback_Python_GPU_Environment
- Environment:OpenBMB_UltraFeedback_vLLM_Multi_GPU_Environment
- Environment:OpenBMB_UltraFeedback_OpenAI_API_Environment