Principle:Microsoft BIPIA Model Loading
Overview
A factory-based model loading pattern that provides a unified interface for instantiating 20+ LLM backends across OpenAI API, HuggingFace Transformers, and vLLM inference engines.
Description
The BIPIA benchmark abstracts LLM loading behind a unified factory so that benchmark evaluation code can work identically regardless of the underlying model backend. Whether the target model is an API-based service such as GPT-3.5 or GPT-4, a locally loaded HuggingFace Transformers model with full or quantized weights, or a high-throughput vLLM-accelerated model, the calling code interacts with the same interface.
Each concrete model class inherits from BaseModel and is required to implement two methods:
process_fn()-- Preprocesses the prompt or conversation into the format expected by the specific backend (e.g., chat message list for OpenAI, tokenized input for HuggingFace).generate()-- Sends the processed input to the model and returns the generated text output.
This contract means that benchmark drivers, attack injectors, and evaluation harnesses never need to know which backend is in use. They simply call process_fn() followed by generate() and receive a string result.
Usage
Use this pattern when you need to load any of the 20+ supported LLMs for benchmark evaluation within the BIPIA framework. The factory resolves model names from simple string identifiers (e.g., "gpt35", "vicuna_13b") or from YAML configuration file paths. This makes it straightforward to sweep across many models in a single evaluation run by iterating over a list of names or config paths.
Theoretical Basis
The pattern is built on two well-known software design principles:
Factory Pattern: A single entry point (AutoLLM.from_name()) accepts a string and returns the appropriate class. The caller does not need to know which concrete class is returned; it only relies on the shared BaseModel interface.
Inheritance Hierarchy: The class hierarchy is organized as follows:
BaseModel +-- GPTModel (OpenAI API-based models: gpt35, gpt4, gpt35_0613, gpt4_0613, ...) +-- LLMModel (HuggingFace Transformers models: llama2_7b, vicuna_7b, vicuna_13b, ...) +-- vLLMModel (vLLM-accelerated models: mistral, llama2_7b_vllm, vicuna_7b_vllm, ...)
YAML Configuration: Rather than hard-coding model parameters in Python, BIPIA uses declarative YAML config files. Each config contains keys such as:
model_name: meta-llama/Llama-2-7b-chat-hf
chat: true
load_8bit: false
api_key: ${OPENAI_API_KEY}
Resolution Pseudocode:
def from_name(cls, name: str) -> Type[BaseModel]:
if name in LLM_NAME_TO_CLASS:
return LLM_NAME_TO_CLASS[name]
elif name ends with ".yaml" or ".yml":
config = load_yaml(name)
return LLM_NAME_TO_CLASS[config["model_name"]]
else:
raise ValueError(f"Unsupported model: {name}")
This resolution logic means that the same factory call works whether the user provides a short alias like "gpt35" or a full path like "config/gpt35.yaml". The YAML path approach is especially useful for parameterized sweeps where each config file specifies different quantization, context length, or API endpoint settings.